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SCHIZOPHRENIA ASSOCIATED GENES, PROTEINS AND BIALLELIC 

MARKERS 



FIELD OF THE INVENTION 

The invention concerns the human sbgl, g34665, sbg2 5 g35017 and g35018 genes, 
polynucleotides, polypeptides biallelic markers, and human chromosome 13q31-q33 biallelic 
markers. The invention also concerns the association established between schizophrenia and 
bipolar disorder and the biallelic markers and the sbgl, g34665, sbg2, g35017 and g35018 genes 
and nucleotide sequences. The invention provides means to identify compounds useful in the 
treatment of schizophrenia, bipolar disorder and related diseases, means to determine the 
predisposition of individuals to said disease as well as means for the disease diagnosis and 
prognosis. 

BACKGROUND OF THE INVENTION 
Advances in the technological armamentarium available to basic and clinical 
investigators have enabled increasingly sophisticated studies of brain and nervous system 
function in health and disease. Numerous hypotheses both neurobiological and pharmacological 
have been advanced with respect to the neurochemical and genetic mechanisms involved in 
central nervous system (CNS) disorders, including psychiatric disorders and neurodegenerative 
diseases. However, CNS disorders have complex and poorly understood etiologies, as well as 
symptoms that are overlapping, poorly characterized, and difficult to measure. As a result 
future treatment regimes and drug development efforts will be required to be more sophisticated 
and focused on multigenic causes, and will need new assays to segment disease populations, and 
provide more accurate diagnostic and prognostic information on patients suffering from CNS 
disorders. 

Neurological Basis of CNS Disorders 

Neurotransmitters serve as signal transmitters throughout the body. Diseases that affect 
neurotransmission can therefore have serious consequences. For example, for over 30 years the 
leading theory to explain the biological basis of many psychiatric disorders such as depression 
has been the monoamine hypothesis. This theory proposes that depression is partially due to a 
deficiency in one of the three main biogenic monoamines, namely dopamine, norepinephrine 
and/or serotonin. 

In addition to the monoamine hypothesis, numerous arguments tend to show the value 
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in taking into account the overall function of the brain and no longer only considering a single 
neuronal system. In this context, the value of dual specific actions on the central aminergic 
systems including second and third messenger systems has now emerged. 
Endocrine Basis of CNS Disorders 

It is furthermore apparent that the main monoamine systems, namely dopamine, 
norepinephrine and serotonin, do not completely explain the pathophysiology of many CNS 
disorders. In particular, it is clear that CNS disorders may have an endocrine component; the 
hypothalamic-pituitary-adrenal (HPA) axis, including the effects of corticotrophin-releasing 
factor and glucocorticoids, plays an important role in the pathophysiology of CNS disorders. 

In the hypothalamus-pituitary-adrenal (HPA) axis, the hypothalamus lies at the top of 
the hierarchy regulating hormone secretion. It manufactures and releases peptides (small chains 
of amino acids) that act on the pituitary, at the base of the brain, stimulating or inhibiting the 
prtuitary's release of various hormones into the blood. These hormones, among them growth 
hormone, thyroid-stimulating hormone and adrenocorticotropic hormone (ACTH), control the 
release of other hormones from target glands. In addition to functioning outside the nervous 
system, the hormones released in response to pituitary hormones also feed back to the pituitary 
and hypothalamus. There they deliver inhibitory signals that serve to limit excess hormone 
biosynthesis. 
CNS Disorders 

Neurotransmitter and hormonal abnormalities are implicated in disorders of movement 
(e.g. Parkinson's disease, Huntington's disease, motor neuron disease, etc.), disorders of mood 
(e.g. unipolar depression, bipolar disorder, anxiety, etc.) and diseases involving the intellect 
(e.g. Alzheimer's disease, Lewy body dementia, schizophrenia, etc.). In addition, these systems 
have been implicated in many other disorders, such as coma, head injury, cerebral infarction, 
epilepsy, alcoholism and the mental retardation states of metabolic origin seen particularly in 
childhood. 

Genetic Analysis of Complex Traits 

Until recently, the identification of genes linked with detectable traits has relied mainly 
on a statistical approach called linkage analysis. Linkage analysis is based upon establishing a 
correlation between the transmission of genetic markers and that of a specific trait throughout 
generations within a family. Linkage analysis involves the study of families with multiple 
affected individuals and is useful in the detection of inherited-traits, which are caused by a 
single gene, or possibly a very small number of genes. But, linkage studies have proven 
difficult when applied to complex genetic traits. Most traits of medical relevance do not follow 
simple Mendelian monogenic inheritance. However, complex diseases often aggregate in 



WO 00/58510 PCT/IB00/00435 

3 

families, which suggests that there is a genetic component to be found. Such complex traits are 
often due to the combined action of multiple genes as well as environmental factors. Such 
complex trait, include susceptibilities to heart disease, hypertension, diabetes, cancer and 
inflammatory diseases. Drug efficacy, response and tolerance/toxicity can also be considered as 
5 multifactoral traits invblving a genetic component in the same way as complex diseases. 

Linkage analysis cannot be applied to the study of such traits for which no large informative 
families are available. Moreover, because of their low penetrance, such complex traits do not 
segregate in a clear-cut Mendelian manner as they are passed from one generation to the next. 
Attempts to map such diseases have been plagued by inconclusive results, demonstrating the 

1 0 need for more sophisticated genetic tools. 

Knowledge of genetic variation in the neuronal and endocrine systems is important for 
understanding why some people are more susceptible to disease or respond differently to 
treatments. Ways to identify genetic polymorphism and to analyze how they impact and predict 
disease susceptibility and response to treatment are needed. 

15 Although the genes involved in the neuronal and endocrine systems represent major 

drug targets and are of high relevance to pharmaceutical research, we still have scant knowledge 
concerning the extent and nature of, sequence variation in these genes and their regulatory 
elements. In the case where polymorphisms have been identified the relevance of the variation 
is rarely understood. While polymorphisms hold promise for use as genetic markers in 

20 determining which genes contribute to multigenic or quantitative traits, suitable markers and 

suitable methods for exploiting those markers have not been found and brought to bare on the 
genes related to disorders of the brain and nervous system. 

The basis for accomplishment of these goals is to use genetic association analysis to 
detect markers that predict susceptibility for these traits. Recently, advances in the fields of 

25 genetics and molecular biology have allowed identification of forms, or alleles, of human genes 

that lead to diseases. Most of the genetic variations responsible for human diseases identified so 
far, belong to the class of single gene disorders. As this name implies, the development of 
single gene disorders is determined, or largely influenced, by the alleles of a single gene. The 
alleles that cause these disorders are, in general, highly deleterious (and highly penetrant) to 

30 individuals who carry them. Therefore, these alleles and their associated diseases, with some 

exceptions, tend to be very rare in the human population. In contrast, most common diseases 
and non-disease traits, such as a physiological response to a pharmaceutical agent, can be 
viewed as the result of many complex factors. These can include environmental exposures 
(toxins, allergens, infectious agents, climate, and trauma) as well as multiple genetic factors. 

35 Association studies seek to analyze the distributions of chromosomes that have occurred 
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in populations of unrelated (at least not directly related) individuals. An assumption in this type 
of study is that genetic alleles that result in susceptibility for a common trait arose by ancient 
mutational events on chromosomes that have been passed down through many generations in 
the population. These alleles can become common throughout the population in part because 
the trait they influence, if deleterious, is only expressed in a fraction of those individuals who 
carry them. Identification of these "ancestral" chromosomes is made difficult by the fact that 
genetic markers are likely to have become separated from the trait susceptibility allele through 
the process of recombination, except in regions of DNA which immediately surround the allele. 
The identities of genetic markers contained within the fragments of DNA surrounding a 
susceptibility allele will be the same as those from the ancestral chromosome on which the 
allele arose. Therefore, individuals from the population who express a complex trait might be 
expected to carry the same set of genetic markers in the vicinity of a susceptibility allele more 
often than those who do not express the trait; that is these markers will show an association with 
the trait. 

Schizophrenia 

Schizophrenia is one of the most severe and debilitating of the major psychiatric 
diseases. It usually starts in late adolescence or early adult life and often becomes chronic and 
disabling. Men and women are at equal risk of developing this illness; however, most males 
become ill between 16 and 25 years old, while females develop symptoms between 25 and 30. 
People with schizophrenia often experience both "positive" symptoms (e.g., delusions, 
hallucinations, disorganized thinking, and agitation) and "negative" symptoms (e.g., lack of 
drive or initiative, social withdrawal, apathy, and emotional unresponsiveness). 

Schizophrenia affects 1% of the world population. There are an estimated 45 million 
people with schizophrenia in the world, with more than 33 million of them in the developing 
countries. This disease places a heavy burden on the patient's family and relatives, both in 
terms of the direct and indirect costs involved and the social stigma associated with the illness, 
sometimes over generations. Such stigma often leads to isolation and neglect. 

Moreover, schizophrenia accounts for one fourth of all mental health costs and takes up 
one in three psychiatric hospital beds. Most schizophrenia patients are never able to work. The 
cost of schizophrenia to society is enormous. In the United States, for example, the direct cost 
of treatment of schizophrenia has been estimated to be close to 0.5% of the gross national 
product. Standardized mortality ratios (SMRs) for schizophrenic patients are estimated to be 
two to four times higher than the general population, and their life expectancy overall is 20 % 
shorter than for the general population. The most common cause of death among schizophrenic 
patients is suicide (in 10 % of patients) which represents a 20 times higher risk than for the 
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general population. Deaths from heart disease and from diseases of the respiratory and 
digestive system are also increased among schizophrenic patients. 
Bipolar Disorder 

Bipolar disorders are relatively common disorders with severe and potentially disabling 
5 effects. In addition to the severe effects on patients' social development, suicide completion 

rates among bipolar patients are reported to be about 15%. 

Bipolar disorders are characterized by phases of excitement and often including 
depression; the excitement phases, referred to as mania or hypomania, and depression can 
alternate or occur in various admixtures, and can occur to different degrees of severity and over 

1 0 varying time periods. Because bipolar disorders can exist in different forms and display 

different symptoms, the classification of bipolar disorder has been the subject of extensive 
studies resulting in the definition of bipolar disorder subtypes and widening of the overall 
concept to include patients previously thought to be suffering from different disorders. Bipolar 
disorders often share certain clinical signs, symptoms, treatments and neurobiological features 

15 with psychotic illnesses in general and therefore present a challenge to the psychiatrist to make 

an accurate diagnosis. Furthermore, because the course of bipolar disorders and various mood 
and psychotic disorders can differ greatly, it is critical to characterize the illness as early as 
possible in order to offer means to manage the illness over a long term. 

Bipolar disorders appear in about 1 3% of the population and have been reported to 

20 constitute about half of the mood disorders seen in a psychiatric clinic. Bipolar disorders have 

been found to vary with gender depending of the type of disorder; for example, bipolar disorder 
I is found equally among men and women, while bipolar disorder II is reportedly more common 
in women. The age of onset of bipolar disorders is typically in the teenage years and diagnosis 
is typically made in the patient's early twenties. Bipolar disorders also occur among the elderly, 

25 generally as a result of a medical or neurological disorder. 

The costs of bipolar disorders to society are enormous. The mania associated with the 
disease impairs performance and causes psychosis, and often results in hospitalization. This 
disease places a heavy burden on the patient's family and relatives, both in terms of the direct 
and indirect costs involved and the social stigma associated with the illness, sometimes over 

30 generations. Such stigma often leads to isolation and neglect. Furthermore, the earlier the onset, 

the more severe are the effects of interrupted education and social development. 

The DSM-IV classification of bipolar disorder distinguishes among four types of 
disorders based on the degree and duration of mania or hypomania as well as two types of 
disorders which are evident typically with medical conditions or their treatments, or to 

35 substance abuse. Mania is recognized by elevated, expansive or irritable mood as well as by 
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distractability, impulsive behavior, increased activity, grandiosity, elation, racing thoughts, and 
pressured speech. Of the four types of bipolar disorder characterized by the particular degree 
and duration of mania , DSM-1V includes: 

- bipolar disorder 1, including patients displaying mania for at least one week; 

- bipolar disorder II, including patients displaying hypomania for at least 4 days, 
characterized by milder symptoms of excitement than mania, who have not previously displayed 
mania, and have previously suffered from episodes of major depression; 

- bipolar disorder not otherwise specified (NOS), including patients otherwise 
displaying features of bipolar disorder II but not meeting the 4 day duration for the excitement 
phase, or who display hypomania without an episode of major depression; and 

- cyclothymia, including patients who show numerous manic and depressive symptoms 
that do not meet the criteria for hypomania or major depression, but which are displayed for 
over two years without a symptom-free interval of more than two months. 

The remaining two types of bipolar disorder as classified in DSM-VI are disorders 
evident or caused by various medical disorder and their treatments, and disorders involving or 
related to substance abuse. Medical disorders which can cause bipolar disorders typically 
include endocrine disorders and cerebrovascular injuries, and medical treatments causing 
bipolar disorder are known to include glucocorticoids and the abuse of stimulants. The disorder 
associated with the use or abuse of a substance is referred to as "substance induced mood 
disorder with manic or mixed features". 

Diagnosis of bipolar disorder can be very challenging. One particularly troublesome 
difficulty is that some patients exihibit mixed states, simultaneously manic and dysphoric or 
depressive, but do not fall into the DSM-IV classification because not all required criteria for 
mania and major depression are met daily for at least one week. Other difficulties include 
classification of patients in the DSM-IV groups based on duration of phase since patients often 
cycle between excited and depressive episodes at different rates. In particular, it is reported that 
the use of antidepressants may alter the course of the disease for the worse by causing "rapid- 
cycling". Also making diagnosis more difficult is the fact that bipolar patients, particularly at 
what is known as Stage III mania, share symptoms of disorganized thinking and behavior with 
bipolar disorder patients. Furthermore, psychiatrists must distinguish between agitated 
depression and mixed mania; it is common that patients with major depression (14 days or 
more) exhibit agitiation, resulting in bipolar-like features. A yet further complicating factor is 
that bipolar patients have an exceptionally high rate of substance, particularly alcohol abuse. 
While the prevalence of mania in alcoholic patients is low, it is well known that substance 
abusers can show excited symptoms. Difficulties therefore result for the diagnosis of bipolar 
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patients with substance abuse. 
Treatment 

As there are currently no cures for bipolar disorder or schizophrenia, the objective of 
treatment is to reduce the severity of the symptoms, if possible to the point of remission. Due to 
the similarities in symptoms, schizophrenia and bipolar disorder are often treated with some of 
the same medicaments. Both diseases are often treated with antipsychotics and neuroleptics. 

For schizophrenia, for example, antipsychotic medications are the most common and 
most valuable treatments. There are four main classes of antipsychotic drugs which are 
commonly prescribed for schizophrenia. The first, neuroleptics, exemplified by chlorpromazine 
(Thorazine), has revolutionized the treatment of schizophrenic patients by reducing positive 
(psychotic) symptoms and preventing their recurrence. Patients receiving chlorpromazine have 
been able to leave mental hospitals and live in community programs or their own homes. But 
these drugs are far from ideal. Some 20% to 30% of patients do not respond to them at all, and 
others eventually relapse. These drugs were named neuroleptics because they produce serious 
neurological side effects, including rigidity and tremors in the arms and legs, muscle spasms, 
abnormal body movements, and akathisia (restless pacing and fidgeting). These side effects are 
so troublesome that many patients simply refuse to take the drugs. Besides, neuroleptics do not 
improve the so-called negative symptoms of schizophrenia and the side effects may even 
exacerbate these symptoms. Thus, despite the clear beneficial effects of neuroleptics, even 
some patients who have a good short-term response will ultimately deteriorate in overall 
functioning. 

The well known deficiencies in the standard neuroleptics have stimulated a search for 
new treatments and have led to a new class of drugs termed atypical neuroleptics. The first 
atypical neuroleptic, Clozapine, is effective for about one third of patients who do not respond 
to standard neuroleptics. It seems to reduce negative as well as positive symptoms, or at least 
exacerbates negative symptoms less than standard neuroleptics do. Moreover, it has beneficial 
effects on overall functioning and may reduce the chance of suicide in schizophrenic patients. It 
does not produce the troubling neurological symptoms of the standard neuroleptics, or raise 
blood levels of the hormone prolactin, excess of which may cause menstrual irregularities and 
infertility in women, impotence or breast enlargement in men. Many patients who cannot 
tolerate standard neuroleptics have been able to take clozapine. However, clozapine has serious 
limitations. It was originally withdrawn from the market because it can cause agranulocytosis, a 
potentially lethal inability to produce white blood cells. Agranulocytosis remains a threat that 
requires careful monitoring and periodic blood tests. Clozapine can also cause seizures and 
other disturbing side effects (e.g., drowsiness, lowered blood pressure, drooling, bed-wetting, 
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and weight gain). Thus it is usually taken only by patients who do not respond to other drugs. 

Researchers have developed a third class of antipsychotic drugs that have the virtues of 
clozapine without its defects. One of these drugs is risperidone (Risperdal). Early studies 
suggest that it is as effective as standard neuroleptic drugs for positive symptoms and may be 
somewhat more effective for negative symptoms. It produces more neurological side effects 
than clozapine but fewer than standard neuroleptics. However, it raises prolactin levels. 
Risperidone is now prescribed for a broad range of psychotic patients, and many clinicians seem 
to use it before clozapine for patients who do not respond to standard drugs, because they regard 
it as safer. Another new drug is Olanzapine (Zyprexa) which is at least as effective as standard 
drugs for positive symptoms and more effective for negative symptoms. It has few neurological 
side effects at ordinary clinical doses, and it does not significantly raise prolactin levels. 
Although it does not produce most of clozapine's most troubling side effects, including 
agranulocytosis, some patients taking olanzapine may become sedated or dizzy, develop dry 
mouth, or gain weight. In rare cases, liver function tests become transiently abnormal. 

Outcome studies in schizophrenia are usually based on hospital treatment studies and 
may not be representative of the population of schizophrenia patients. At the extremes of 
outcome, 20 % of patients seem to recover completely after one episode of psychosis, whereas 
14-19% of patients develop a chronic unremitting psychosis and never fully recover. In general, 
clinical outcome at five years seems to follow the rule of thirds: with about 35 % of patients in 
the poor outcome category; 36 % in the good outcome category, and the remainder with 
intermediate outcome. Prognosis in schizophrenia does not seem to worsen after five years. 

Whatever the reasons, there is increasing evidence that leaving schizophrenia untreated 
for long periods early in course of the illness may negatively affect the outcome. However, the 
use of drugs is often delayed for patients experiencing a first episode of the illness. The patients 
may not realize that they are ill, or they may be afraid to seek help; family members sometimes 
hope the problem will simply disappear or cannot persuade the patient to seek treatment; 
clinicians may hesitate to prescribe antipsychotic medications when the diagnosis is uncertain 
because of potential side effects. Indeed, at the first manifestation of the disease, schizophrenia 
is difficult to distinguish from bipolar manic-depressive disorders, severe depression, drug- 
related disorders, and stress-related disorders. Since the optimum treatments differ among these 
diseases, the long term prognosis of the disorder also differs the beginning of the treatment. 

For both schizophrenia and bipolar disorder, all the known molecules used for the 
treatment of schizophrenia have side effects and act only against the symptoms of the disease. 
There is a strong need for new molecules without associated side effects and directed against 
targets which are involved in the causal mechanisms of schizophrenia and bipolar disorder. 
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Therefore, tools facilitating the discovery and characterization of these targets are necessary and 
useful. 

Schizophrenia and bipolar disorder are now considered to be brain diseases, and 
emphasis is placed on biological determinants in researching the conditions. In the case of 
schizophrenia, neuroirnaging and neuropathological studies have shown evidence of brain 
abnormalities in schizophrenic patients. The timing of these pathological changes is unclear but 
are likely to be a defect in early brain development. Profound changes have also occurred in 
hypotheses concerning neurotransmitter abnormalities in schizophrenia. The dopamine 
hypothesis has been extensively revised and is no longer considered as a primary causative 
model. 

The aggregation of schizophrenia and bipolar disorder in families, the evidence from 
twin and adoption studies, and the lack of variation in incidence worldwide, indicate that 
schizophrenia and bipolar disorder are primarily genetic conditions, although environmental risk 
factors are also involved at some level as necessary, sufficient, or interactive causes. For 
example, schizophrenia occurs in 1% of the general population. But, if there is one grandparent 
with schizophrenia, the risk of getting the illness increases to about 3%; one parent with 
Schizophrenia, to about 10%. When both parents have schizophrenia, the risk rises to 
approximately 40%. 

Consequently, there is a strong need to identify genes involved in schizophrenia and 
bipolar disorder. The knowledge of these genes will allow researchers to understand the 
etiology of schizophrenia and bipolar disorder and could lead to drugs and medications which 
are directed against the cause of the diseases, not just against their symptoms. 

There is also a great need for new methods for detecting a susceptibility to 
schizophrenia and bipolar disorder, as well as for preventing or following up the development of 
the disease. Diagnostic tools could also prove extremely useful. Indeed, early identification of 
subjects at risk of developing schizophrenia would enable early and/or prophylactic treatment to 
be administered. Moreover, accurate assessments of the eventual efficacy of a medicament as 
well as the patent's eventual tolerance to it may enable clinicians to enhance the benefit/risk 
ratio of schizophrenia and bipolar disorder treatment regimes. 

SUMMARY OF THE INVENTION 
The present invention stems from the identification of novel polymorphisms including 
biallelic markers located on the human chromosome 13q31-q33 locus, the identification and 
characterization of novel schizophrenia-related genes located on the human chromosome 13q31- 
q33 locus, and from the identification of genetic associations between alleles of biallelic 
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markers located on the human chromosome 13q31-q33 locus and disease, as confirmed and 
characterized in a pane! of human subjects. The invention furthermore provides a fine structure 
map of the region which includes the schizophrenia-associated gene sequences. 

The present invention pertains to nucleic acid molecules comprising the genomic 
5 sequences of novel human genes encoding sbgl, g34665, sbg2, g35017 and g35018 proteins, 

proteins encoded thereby, as well as antibodies thereto. The sbgl, g34665, sbg2, g35017 and 
g3501 8 genomic sequences may also comprise regulatory sequence located upstream (5*-end) 
and downstream (3 '-end) of the transcribed portion of said gene, these regulatory sequences 
being also part of the invention. The invention also deals with the cDNA sequence encoding the 
10 sbgl and g3 50 18 proteins. 

Oligonucleotide probes or primers hybridizing specifically with a sbgl, g34665, sbg2, 
g35017 or g35018 genomic or cDNA sequence are also part of the present invention, as well as 
DNA amplification and detection methods using said primers and probes. 

A further object of the invention consists of recombinant vectors comprising any of the 
1 5 nucleic acid sequences described above, and in particular of recombinant vectors comprising a 

sbgl, g34665, sbg2, g35017 or g3501 8 regulatory sequence or a sequence encoding a sbgl, 
g34665, sbg2, g35017 or g3501 8 protein, as well as of cell hosts and transgenic non human 
animals comprising said nucleic acid sequences or recombinant vectors. 

The invention also concerns to biallelic markers of the sbgl, g34665, sbg2, g35017 or 
g3501 8 gene and the use thereof. Included are probes and primers for use in genotyping 
biallelic markers of the invention. 

An embodiment of the invention encompasses any polynucleotide of the invention 
attached to a solid support polynucleotide may comprise a sequence disclosed in the present 
specification; optionally, said polynucleotide may comprise, consist of, or consist essentially of 
25 any polynucleotide described in the present specification; optionally, said determining may be 

performed in a hybridization assay, sequencing assay, microsequencing assay, or an enzyme- 
based mismatch detection assay; optionally, said polynucleotide may be attached to a solid 
support, array, or addressable array; optionally, said polynucleotide may be labeled. 

Finally, the invention is directed to drug screening assays and methods for the screening 
30 of substances for the treatment of schizophrenia, bipolar disorder or a related CNS disorder 

based on the role of sbgl, g34665, sbg2, g35017 and g3501 8 nucleotides and polynucleotides in 
disease. One object of the invention deals with animal models of schizophrenia, including 
mouse, primate, non-human primate bipolar disorder or related CNS disorder based on the role 
of sbgl in disease. The invention is also directed to methods for the screening of substances or 
35 molecules that inhibit the expression of sbgl, g34665, sbg2, g3501 7 or g3501 8, as well as with 
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methods for the screening of substances or molecules that interact with a sbgl, g34665, sbg2, 
g35017 or g3 50 18 polypeptide, or that modulate the activity of a sbgl, g34665, sbg2, g35017 or 
g35018 polypeptide. 

As noted above, certain aspects of the present invention stem from the identification of 
5 genetic associations between schizophrenia and bipolar disorder and alleles of biallelic markers 

located on the human chromosome 13q31-q33 region, and more particularly on a subregion 
thereof referred to herein as Region D. The invention provides appropriate tools for 
establishing further genetic associations between alleles of biallelic markers on the 13q31- 
13q33 locus and either side effects or benefit resulting from the administration of agents acting 
10 on schizophrenia or bipolar disorder, or schizophrenia or bipolar disorder symptoms, includng 

agents like chlorpromazine, clozapine, risperidone, olanzapine, sertindole, quetiapine and 
ziprasidone. 

The invention provides appropriate tools for establishing further genetic associations 
between alleles of biallelic markers on the 13q31-13q33 locus and a trait. Methods and 

1 5 products are provided for the molecular detection of a genetic susceptibility in humans to 

schizophrenia and bipolar disorder. They can be used for diagnosis, staging, prognosis and 
monitoring of this disease, which processes can be further included within treatment 
approaches. The invention also provides for the efficient design and evaluation of suitable 
therapeutic solutions including individualized strategies for optimizing drug usage, and 

20 screening of potential new medicament candidates. 

Additional embodiments are set forth in the Detailed Description of the Invention and in 
the Examples. 



BRIEF DESCRIPTION OF THE FIGURES 
25 Figure 1 is a diagram showing the exon structure of the sbgl gene. 

Figure 2 is a table demonstrating the statistical significance of allelic frequencies of 
selected chromosome 13q31-q33 biallelic markers of the invention in sporadic and familial 
French Canadian schizophrenia cases and controls. 

Figure 3 is a table demonstrating the results of a haplotype association analysis between 
30 total French Canadian schizophrenia cases and haplotypes which consist of chromosome 13q31- 

q33 biallelic markers of the invention. 

Figure 4 is a table showing the involvement of selected biallelic markers of the 
invention in statistically significant haplotypes. 

Figure 5 is a table demonstrating the results of a haplotype association analysis between 
35 French Canadian schizophrenia cases and haplotypes which consist of chromosome 1 3q3 1 -q33 
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biallelic markers of the invention. 

Figure 6 is a table demonstrating the results of a haplotype association analysis between 
French Canadian schizophrenia cases and haplotypes which consist of chromosome 13q31-q33 
biallelic markers of the invention. 
5 Figures 7A and 7B show the results of a haplotype association analysis (Omnibus LR 

test value distribution) between schizophrenia cases and haplotypes comprising Region D 
biallelic markers of the invention. 

Figures 8 A and 8B show the results of a haplotype association analysis (HaplotMaxM 
test value distribution) between schizophrenia cases and haplotypes comprising Region D 
1 0 biallelic markers of the invention. 

Figures 9A and 9B show the results of a haplotype association analysis (Omnibus LR 
test value distribution) between bipolar disorder cases and haplotypes comprising Region D 
biallelic markers of the invention. 

Figures 10A and 10B show the results of a haplotype association analysis (HaploMaxM 
15 test value distribution) between bipolar disorder cases and haplotypes comprising Region D 

biallelic markers of the invention. 

Figures 1 1A and 1 IB show the results of a haplotype association analysis (HaploMaxS 
test value distribution) between bipolar disorder cases and haplotypes comprising Region D 
biallelic markers of the invention. 
20 Figure 12 shows a comparison of the number of significant single and multipoint 

biallelic marker analyses in subregions Dl to D4 of Region D in French Canadian samples. 

Figure 13 shows a summary of the number of significant single and multipoint biallelic 
marker analyses across Region D in French Canadian samples. 

Figure 14 shows a comparison of the number of significant single and multipoint 
25 biallelic marker analyses in subregions Dl to D4 of Region D in United States schizophrenia 

samples. 

Figure 15 shows a summary of the number of significant single and multipoint biallelic 
marker analyses across Region D in United States schizophrenia samples. 

Figure 16 shows a comparison of the number of significant single and multipoint 
30 biallelic marker analyses in subregions Dl to D4 of Region D in Argentinian bipolar disorder 

samples. 

Figure 17 shows a summary of the number of significant single and multipoint biallelic 
marker analyses across Region D in Argentinian bipolar disorder samples. 

Figure 1 8 shows the effect of injection of an sbgl peptide on locomotor activity and 
35 stereotypy of mice. 
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Figure 19 is a block diagram of an exemplary computer system. 

Figure 20 is a flow diagram illustrating one embodiment of a process 200 for comparing a 
new nucleotide or protein sequence with a database of sequences in order to determine the 
homology levels between the new sequence and the sequences in the database. 

Figure 21 is a flow diagram illustrating one embodiment of a process 250 in a computer 
for determining whether two sequences are homologous. 

Figure 22 is a flew diagram illustrating one embodiment of an identifier process 300 for 
detecting the presence of a feature in a sequence. 

BRIEF DESCRIPTION OF THE SEQUENCES PROVIDED IN THE 
SEQUENCE LISTING 
SEQ ID No. 1 contains the approximately 319kb of genomic nucleotide sequence 
comprising sbgl, g34665, sbg2, g35017 and g3501 8 nucleic acid sequences and the biallelic 
markers Al to A360 and polymorphisms A361 to A489 located on the human chromosome 
13q31-q33 locus. 

SEQ ID Nos. 2 to 26 contain cDNA sequences of the sbgl gene. 

SEQ ID Nos. 27 to 35 contain amino acid sequences of sbgl polypeptides, encoded by 
cDNAs of SEQ ID Nos. 2 to 26. 

SEQ ID No. 36 to 40 contain cDNA sequences of the g3501 8 gene 
SEQ ID No. 41 to 43 contain amino acid sequences of an g3501 8 polypeptides. 
SEQ ID No. 44 to 53 contain primers used to isolate sbgl cDNAs 
SEQ ID No. 54 to 1 1 1 contain genomic nucleotide sequences comprising exons of the 
sbgl gene from several different primates. 

SEQ ID Nos. 1 12 to 229 respectively contain the nucleotide sequence of the amplicons 
which comprise the biallelic markers A243 to A360 located on the human chromosome 13q3 1- 
q33 locus. 

SEQ ID No 230 contains a primer containing the additional PU 5* sequence described 
further in Example 2 

SEQ ID No 23 1 contains a primer containing the additional RP 5' sequence described 
further in Example 2. 

In accordance with the regulations relating to Sequence Listings, the following codes 
have been used in the Sequence Listing to indicate the locations of biallelic markers within the 
sequences and to identify each of the alleles present at the polymorphic base. The code "r" in 
the sequences indicates that one allele of the polymorphic base is a guanine, while the other 
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allele is an adenine. The code "y" in the sequences indicates that one allele of the polymorphic 
base is a thymine, while the other allele is a cytosine. The code "m" in the sequences indicates 
that one allele of the polymorphic base is an adenine, while the other allele is an cytosine. The 
code "k" in the sequences indicates that one allele of the polymorphic base is a guanine, while 
the other allele is a thymine. The code "s" in the sequences indicates that one allele of the 
polymorphic base is a guanine, while the other allele is a cytosine. The code "w" in the 
sequences indicates that one allele of the polymorphic base is an adenine, while the other allele 
is an thymine. 

DETAILED DE SCRIPTION OF THE INVENTION 

The identification of genes involved in a particular trait such as a specific central 
nervous system disorder, like schizophrenia, can be carried out through two main strategies 
currently used for genetic mapping: linkage analysis and association studies. Linkage analysis 
requires the study of families with multiple affected individuals and is now useful in the 
detection of mono- or oligogenic inherited traits. Conversely, association studies examine the 
frequency of marker alleles in unrelated trait (T+) individuals compared with trait negative (T-) 
controls, and are generally employed in the detection of polygenic inheritance. 

Candidate region on the chromosome 13 (linkage analysis) 

Genetic link or "linkage" is based on an analysis of which of two neighboring 
sequences on a chromosome contains the least recombinations by crossing-over during meiosis. 
To do this, chromosomal markers, like microsatellite markers, have been localized with 
precision on the genome. Genetic link analysis calculates the probabilities of recombinations on 
the target gene with the chromosomal markers used, according to the genealogical tree, the 
transmission of the disease, and the transmission of the markers. Thus, if a particular allele of a 
given marker is transmitted with the disease more often than chance would have it 
(recombination level between 0 and 0.5), it is possible to deduce that the target gene in question 
is found in the neighborhood of the marker. 

Using this technique, it has been possible to localize several genes demonstrating a 
genetic predisposition of familial cancers. In order to be able to be included in a genetic link 
study, the families affected by a hereditary form of the disease must satisfy the 
"informativeness" criteria: several affected subjects (and whose constitutional DNA is 
available) per generation, and at best having a large number of siblings. 

By linkage analysis, observations have been made, according to which a candidate 
region for schizophrenia is present on chromosome 13q32 locus (Blouin et al., 1998). Linkage 
analysis has been successfully applied to map simple genetic traits that show clear Mendelian 
inheritance patterns and which have a high penetrance, but this method suffers from a variety of 
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drawbacks. First, linkage analysis is limited by its reliance on the choice of a genetic model 
suitable for each studied trait. Furthermore, the resolution attainable using linkage analysis is 
limited, and complementary studies are required to refine the analysis of the typical 20 Mb 
regions initially identified through this method. In addition, linkage analysis have proven 
5 difficult when applied to complex genetic traits, such as those due to the combined action of 

multiple genes and/or environmental factors. In such cases, too great an effort and cost are 
needed to recruit the adequate number of affected families required for applying linkage 
analysis to these situations. Finally, linkage analysis cannot be applied to the study of traits for 
which no large informative families are available. 

10 In the present invention alternative means for conducting association studies rather than 

linkage analysis between markers located on the chromosome 13q31-q33 locus and a trait, 
preferably schizophrenia or bipolar disorder, are disclosed. 

In the present application, additional biallelic markers located on the human 
chromosome 13q31-q33 locus associated with schizophrenia are disclosed. The identification 

15 of these biallelic markers in association with schizophrenia has allowed for the further definition 

of the chromosomal region suspected of containing a genetic determinant involved in a 
predisposition to develop schizophrenia and has resulted in the identification of novel gene 
sequences disclosed herein which are associated with a predisposition to develop schizophrenia. 
The present invention thus provides an extensive fine structure map of the I3q31-q33 locus, 

20 including novel biallelic markers located on the human 13q31-q33 locus, approximately 3 1 9kb 

of genomic nucleotide sequence of a subregion of the human 13q3 l-q33 locus, and 
polymorphisms including biallelic markers and nucleotide deletions in said 319kb genomic 
sequence. The biallelic markers of the human chromosome 1 3q3 l-q33 locus and the nucleotide 
sequences, polymorphisms and gene sequences located in Region D subregion of the human 

25 chromosome 13q31-q33 locus are useful as genetic and physical markers for further mapping 

studies. The approximately 31 9kb of genomic nucleotide sequence disclosed herein can further 
serve as a reference in genetic or physical analysis of deletions, substitutions, and insertions in 
that region. Additionally, the sequence information provides a resource for the further 
identification of new genes in that region. Additionally, the sequences comprising the the 

30 schizophrenia-associated genes are useful, for example, for the isolation of other genes in 

putative gene families, the identification of homologs from other species, treatment of disease 
and as probes and primers for diagnostic or screening assays as described herein. 

These identified polymorphisms are used in the design of assays for the reliable 
detection of genetic susceptibility to schizophrenia and bipolar disorder. They can also be used 

35 in the design of drug screening protocols to provide an accurate and efficient evaluation of the 
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therapeutic and side-effect potential of new or already existing medicament or treatment 



regime. 



Definitions 

As used interchangeably herein, the term " oligonucleotides ", and "polynucleotides" 
include RNA, DNA, or RNA/DNA hybrid sequences of more than one nucleotide in either 
single chain or duplex form. The term " nucleotide " as used herein as an adjective to describe 
molecules comprising RNA, DNA, or RNA/DNA hybrid sequences of any length in single- 
stranded or duplex form. The term "nucleotide" is also used herein as a noun to refer to 
individual nucleotides or varieties of nucleotides, meaning a molecule, or individual unit in a 
larger nucleic acid molecule, comprising a purine or pyrimidine, a ribose or deoxyribose sugar 
moiety, and a phosphate group, or phosphodiester linkage in the case of nucleotides within an 
oligonucleotide or polynucleotide. Although the term "nucleotide" is also used herein to 
encompass "modified nucleotides" which comprise at least one modifications (a) an alternative 
linking group, (b) an analogous form of purine, (c) an analogous form of pyrimidine, or (d) an 
analogous sugar, for examples of analogous linking groups, purine, pyrimidines, and sugars see 
for example PCT publication No. WO 95/04064. However, the polynucleotides of the invention 
are preferably comprised of greater than 50% conventional deoxyribose nucleotides, and most 
preferably greater than 90% conventional deoxyribose nucleotides. The polynucleotide 
sequences of the invention may be prepared by any known method, including synthetic, 
recombinant, ex vivo generation, or a combination thereof, as well as utilizing any purification 
methods known in the art. 

The term " purified " is used herein to describe a polynucleotide or polynucleotide vector 
of the invention which has been separated from other compounds including, but not limited to 
other nucleic acids, carbohydrates, lipids and proteins (such as the enzymes used in the 
synthesis of the polynucleotide), or the separation of covalently closed polynucleotides from 
linear polynucleotides. A polynucleotide is substantially pure when at least about 50 %, 
preferably 60 to 75% of a sample exhibits a single polynucleotide sequence and conformation 
(linear versus covalently close). A substantially pure polynucleotide typically comprises about 
50 %, preferably 60 to 90% weight/weight of a nucleic acid sample, more usually about 95%, 
and preferably is over about 99% pure. Polynucleotide purity or homogeneity may be indicated 
by a number of means well known in the art, such as agarose or polyacrylamide gel 
electrophoresis of a sample, followed by visualizing a single polynucleotide band upon staining 
the gel. For certain purposes higher resolution can be provided by using HPLC or other means 
well known in the art. 

The term " isolated " requires that the material be removed from its original environment 
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(e.g., the natural environment if it is naturally occurring). For example, a naturally-occurring 
polynucleotide or polypeptide present in a living animal is not isolated, but the same 
polynucleotide or DNA or polypeptide, separated from some or all of the coexisting materials in 
the natural system, is isolated. Such polynucleotide could be part of a vector and/or such 
polynucleotide or polypeptide could be part of a composition, and still be isolated in that the 
vector or composition is not part of its natural environment. 

The term "primer" denotes a specific oligonucleotide sequence which is complementary 
to a target nucleotide sequence and used to hybridize to the target nucleotide sequence. A 
primer serves as an initiation point for nucleotide polymerization catalyzed by either DNA 
polymerase, RN A polymerase or reverse transcriptase. 

The term " probe " denotes a defined nucleic acid segment (or nucleotide analog 
segment, e.g., polynucleotide as defined herein) which can be used to identify a specific 
polynucleotide sequence present in samples, said nucleic acid segment comprising a nucleotide 
sequence complementary of the specific polynucleotide sequence to be identified. 
1 5 The terms " trait " and " phenotvpe " are used interchangeably herein and refer to any 

clinically distinguishable, detectable or otherwise measurable property of an organism such as 
symptoms of, or susceptibility to a disease for example. Typically the terms "trait" or 
"phenotype" are used herein to refer to symptoms of, or susceptibility to schizophrenia or 
bipolar disorder; or to refer to an individual's response to an agent acting on schizophrenia or 
20 bipolar disorder; or to refer to symptoms of, or susceptibility to side effects to an agent acting on 

schizophrenia or bipolar disorder. 

The term " allele " is used herein to refer to variants of a nucleotide sequence. A biallelic 
polymorphism has two forms. Typically the first identified allele is designated as the original 
allele whereas other alleles are designated as alternative alleles. Diploid organisms may be 
25 homozygous or heterozygous for an allelic form. 

The term " heterozygosity rate " is used herein to refer to the incidence of individuals in 
a population, which are heterozygous at a particular allele. In a biallelic system the 
heterozygosity rate is on average equal to 2P a (l-P a ), where P a is the frequency of the least 
common allele. In order to be useful in genetic studies a genetic marker should have an 
30 adequate level of heterozygosity to allow a reasonable probability that a randomly selected 

person will be heterozygous. 

The term " genotype " as used herein refers the identity of the alleles present in an 
individual or a sample. In the context of the present invention a genotype preferably refers to 
the description of the biallelic marker alleles present in an individual or a sample. The term 
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" g enotyping " a sample or an individual for a biallelic marker involves determining the specific 
allele or the specific nucleotide(s) carried by an individual at a biallelic marker. 

The term " mutation " as used herein refers to a difference in DNA sequence between or 
among different genomes or individuals which has a frequency below 1%. 

The term " haplotype " refers to a combination of alleles present in an individual or a 
sample on a single chromosome. In the context of the present invention a haplotype preferably 
refers to a combination of biallelic marker alleles found in a given individual and which may be 
associated with a phenotype. 

The term " polymorphism " as used herein refers to the occurrence of two or more 
alternative genomic sequences or alleles between or among different genomes or individuals. 
"Polymorphic" refers to the condition in which two or more variants of a specific genomic 
sequence can be found in a population. A "polymorphic site" is the locus at which the variation 
occurs. A polymorphism may comprise a substitution, deletion or insertion of one or more 
nucleotides. A single nucleotide polymorphism is a single base pair change. Typically a single 
nucleotide polymorphism is the replacement of one nucleotide by another nucleotide at the 
polymorphic site. Deletion of a single nucleotide or insertion of a single nucleotide, also give 
rise to single nucleotide polymorphisms. In the context of the present invention "single 
nucleotide polymorphism" preferably refers to a single nucleotide substitution. Typically, 
between different genomes or between different individuals, the polymorphic site may be 
occupied by two different nucleotides. 

The terms "biallelic polymorphism" and " biallelic marker " are used interchangeably 
herein to refer to a polymorphism having two alleles at a fairly high frequency in the population, 
preferably a single nucleotide polymorphism. A "biallelic marker a ll*l P » refers to the 
nucleotide variants present at a biallelic marker site. Typically the frequency of the less 
common allele of the biallelic markers of the present invention has been validated to be greater 
than 1%, preferably the frequency is greater than 10%, more preferably the frequency is at least 
20% (i.e. heterozygosity rate of at least 0.32), even more preferably the frequency is at least 
30% (i.e. heterozygosity rate of at least 0.42). A biallelic marker wherein the frequency of the 
less common allele is 30% or more is termed a " high Quality biallelic marker " AH of the 
genotyping, haplotyping, association, and interaction study methods of the invention may 
optionally be performed solely with high quality biallelic markers. 

The location of nucleotides in a polynucleotide with respect to the center of the 
polynucleotide are described herein in the following manner. When a polynucleotide has an 
odd number of nucleotides, the nucleotide at an equal distance from the 3* and 5' ends of the 
polynucleotide is considered to be " at the center " of the polynucleotide, and any nucleotide 
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immediately adjacent to the nucleotide at the center, or the nucleotide at the center itself is 
considered to be " within 1 nucleotide of the center ." With an odd number of nucleotides in a 
polynucleotide any of the five nucleotides positions in the middle of the polynucleotide would 
be considered to be within 2 nucleotides of the center, and so on. When a polynucleotide has an 
5 even number of nucleotides, there would be a bond and not a nucleotide at the center of the 

polynucleotide. Thus, either of the two central nucleotides would be considered to be "within 1 
nucleotide of the center" and any of the four nucleotides in the middle of the polynucleotide 
would be considered to be "within 2 nucleotides of the center", and so on. For polymorphisms 
which involve the substitution, insertion or deletion of 1 or more nucleotides, the 

10 polymorphism, allele or biallelic marker is "at the center" of a polynucleotide if the difference 

between the distance from the substituted, inserted, or deleted polynucleotides of the 
polymorphism and the 3* end of the polynucleotide, and the distance from the substituted, 
inserted, or deleted polynucleotides of the polymorphism and the 5' end of the polynucleotide is 
zero or one nucleotide. If this difference is 0 to 3, then the polymorphism is considered to be 

15 "within 1 nucleotide of the center." If the difference is 0 to 5, the polymorphism is considered 

to be "within 2 nucleotides of the center." If the difference is 0 to 7, the polymorphism is 
considered to be "within 3 nucleotides of the center," and so on. For polymorphisms which 
involve the substitution, insertion or deletion of 1 or more nucleotides, the polymorphism, allele 
or biallelic marker is "at the.center" of a polynucleotide if the difference between the distance 

20 from the substituted, inserted, or deleted polynucleotides of the polymorphism and the 3' end of 

the polynucleotide, and the distance from the substituted, inserted, or deleted polynucleotides of 
the polymorphism and the 5' end of the polynucleotide is zero or one nucleotide. If this 
difference is 0 to 3, then the polymorphism is considered to be "within 1 nucleotide of the 
center." If the difference is 0 to 5, the polymorphism is considered to be "within 2 nucleotides 

25 of the center." If the difference is 0 to 7, the polymorphism is considered to be "within 3 

nucleotides of the center," and so on. 

The term " upstream " is used herein to refer to a location which, is toward the 5' end of 
the polynucleotide from a specific reference point. 

The terms " base paired " and " Watson & Crick base paired " are used interchangeably 

30 herein to refer to nucleotides which can be hydrogen bonded to one another be virtue of their 

sequence identities in a manner like that found in double-helical DNA with thymine or uracil 
residues linked to adenine residues by two hydrogen bonds and cytosine and guanine residues 
linked by three hydrogen bonds (See Stryer, L., Biochemistry, 4th edition, 1995). 

The terms " complementary " or " complement thereof are used herein to refer to the 

35 sequences of polynucleotides which is capable of forming Watson & Crick base pairing with 
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another specified polynucleotide throughout the entirety of the complementary region. This 
term is applied to pairs of polynucleotides based solely upon their sequences and not any 
particular set of conditions under which the two polynucleotides would actually bind. 

The terms "sbglgene ", when used herein, encompasses genomic, mRNA and cDNA 
sequences encoding the sbgl protein, including the untranslated regulatory regions of the 
genomic DNA. 

The terms " g3466S gene ", when used herein, encompasses genomic, mRNA and 
cDNA sequences encoding the g34665 protein, including the untranslated regulatory regions of 
the genomic DNA. 

The terms "sbg2_gene when used herein, encompasses genomic, mRNA and cDNA 
sequences encoding the sbgl protein, including the untranslated regulatory regions of the 
genomic DNA. 

The terms " g3 50 17 gene ", when used herein, encompasses genomic, mRNA and 
cDNA sequences encoding the g3501 7 protein, including the untranslated regulatory regions of 
the genomic DNA. 

The terms " g35018gene », when used herein, encompasses genomic, mRNA and 
cDNA sequences encoding the g3501 8 protein, including the untranslated regulatory regions of 
the genomic DNA. 

As used herein the term "13a31-o33-related biallelic m a ,W re ] a tes to a set of biallelic 
markers residing in the human chromosome 13q31-q33 region. The term 13q3 l-q33-related 
biallelic marker encompasses all of the biallelic markers disclosed in Table 6b and any biallelic 
markers in linkage disequilibrium therewith ,as well as any biallelic markers disclosed in Table 
6c and any biallelic markers in linkage disequilibrium therewith. The preferred chromosome 
13q31-q33-related biallelic marker alleles of the present invention include each one the alleles 
described in Tables 6b individually or in groups consisting of all the possible combinations of 
the alleles listed. 

As used herein the term "Region D-rel ated biallelic n^rW relates to a set of biallelic 
markers in linkage disequilibrium with the subregion of the chromosome 13q3 l-q33 region 
referred to herein as Region D. The term Region D-related biallelic marker encompasses the 
biallelic markers Al to A242, A249 to A251, A257 to A263, A269 to A270, A278, A285 to 
A299, A303 to A307, A324, A330, A334 to A335, A346 to 357 and A361 to A489 disclosed in 
Table 6b and any biallelic markers in linkage disequilibrium with markers Al to A242, A249 to 
A251, A257 to A263, A269 to A270, A278, A285 to A299, A303 to A307, A324, A330, A334 
to A335, A346 to 357 and A361 to A489. 

As used herein the term "sbgl -related biallelic marker " relates to a set of biallelic 
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markers in linkage disequilibrium with the sbgl gene or an sbgl nucleotide sequence. The term 
sbgl-related biallelic marker encompasses the biallelic markers A85 to A219 disclosed in Table 
6b and any biallelic markers in linkage disequilibrium therewith. 

As used herein the term " g34665-related biallelic marker " relates to a set of biallelic 
markers in linkage disequilibrium with the g34665 gene or an sbgl nucleotide sequence. The 
term g34665-related biallelic marker encompasses the biallelic markers A230 to A236 disclosed 
in Table 6b and any biallelic markers in linkage disequilibrium therewith. 

As used herein the term " sbg2-re!ated biallelic marker " relates to a set of biallelic 
markers in linkage disequilibrium with the sbg2 gene or an sbg2 nucleotide sequence. The term 
sbg2-related biallelic marker encompasses the biallelic markers A79 to A99 disclosed in Table 
6b and any biallelic markers in linkage disequilibrium therewith. 

As used herein the term " g35017-related biallelic marker " relates to a set of biallelic 
markers in linkage disequilibrium with the g35017 gene or an g35017 nucleotide sequence. The 
term g35017-related biallelic marker encompasses biallelic marker A41 disclosed in Table 6b 
and any biallelic markers in linkage disequilibrium therewith. 

As used herein the term " g35018-related biallelic marker " relates to a set of biallelic 
markers in linkage disequilibrium with the g35018 gene or a g35018 nucleotide sequence. The 
term g35018-related biallelic marker encompasses the biallelic markers Al to A39 disclosed in 
Table 6b and any biallelic markers in linkage disequilibrium therewith. 

The term "polypeptide" refers to a polymer of amino acids without regard to the length 
of the polymer; thus, peptides, oligopeptides, and proteins are included within the definition of 
polypeptide. This term also does not specify or exclude prost-expression modifications of 
polypeptides, for example, polypeptides which include the covalent attachment of glycosyl 
groups, acetyl groups, phosphate groups, lipid groups and the like are expressly encompassed by 
the term polypeptide. Also included within the definition are polypeptides which contain one or 
more analogs of an amino acid (including, for example, non-naturally occurring amino acids, 
amino acids which only occur naturally in an unrelated biological system, modified amino acids 
from mammalian systems etc.), polypeptides with substituted linkages, as well as other 
modifications known in the art, both naturally occurring and non-naturally occurring. 

The term " purified " is used herein to describe a polypeptide of the invention which has 
been separated from other compounds including, but not limited to nucleic acids, lipids, 
carbohydrates and other proteins. A polypeptide is substantially pure when at least about 50%, 
preferably 60 to 75% of a sample exhibits a single polypeptide sequence. A substantially pure 
polypeptide typically comprises about 50%, preferably 60 to 90% weight/weight of a protein 
sample, more usually about 95%, and preferably is over about 99% pure. Polypeptide purity or 
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homogeneity is indicated by a number of means well known in the art, such as agarose or 
polyacrylamide gel electrophoresis of a sample, followed by visualizing a single polypeptide 
band upon staining the gel. For certain purposes higher resolution can be provided by using 
HPLC or other means well known in the art. 
5 As used herein, the term " non-human animal " refers to any non-human vertebrate, birds 

and more usually mammals, preferably primates, farm animals such as swine, goats, sheep, 
donkeys, and horses, rabbits or rodents, more preferably rats or mice. As used herein, the term 
"animal" is used to refer to any vertebrate, preferable a mammal. Both the terms " animal " and 
"mammal" expressly embrace human subjects unless preceded with the term "non-human". 

1 0 As used herein, the term " antibody " refers to a polypeptide or group of polypeptides 

which are comprised of at least one binding domain, where an antibody binding domain is 
formed from the folding of variable domains of an antibody molecule to form three-dimensional 
binding spaces with an internal surface shape and charge distribution complementary to the 
features of an antigenic determinant of an antigen., which allows an immunological reaction 

15 with the antigen. Antibodies include recombinant proteins comprising the binding domains, as 

wells as fragments, including Fab, Fab 1 , F(ab)2, and F(ab')2 fragments. 

As used herein, an " antigenic determinant " is the portion of an antigen molecule, in this 
case an sbgl polypeptide, that determines the specificity of the antigen-antibody reaction. An 
"epitope" refers to an antigenic determinant of a polypeptide. An epitope can comprise as few 

20 as 3 amino acids in a spatial conformation which is unique to the epitope. Generally an epitope 

comprises at least 6 such amino acids, and more usually at least 8-10 such amino acids. 
Methods for determining the amino acids which make up an epitope include x-ray 
crystallography, 2-dimensional nuclear magnetic resonance, and epitope mapping e.g. the 
Pepscan method described by Geysen et al. 1984; PCT Publication No. WO 84/03564; and 

25 PCT Publication No. WO 84/03506. 



Variants and Fragments 

The invention also relates to variants and fragments of the polynucleotides described 
herein, particularly of a nucleotide sequence of SEQ ID Nos. 1 to 26, 36 to 40 and 54 to 229, 
and particularly of a nucleotide sequence of SEQ ID Nos. 1 to 26, 36 to 40 and 54 to 229 
containing one or more biallelic markers and/or other polymorphisms according to the 
invention. 

Variants of polynucleotides, as the term is used herein, are polynucleotides that differ 
from a reference polynucleotide. A variant of a polynucleotide may be a naturally occurring 
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variant such as a naturally occurring allelic variant, or it may be a variant that is not known to 
occur naturally. Such non-naturally occurring variants of the polynucleotide may be made by 
mutagenesis techniques, including those applied to polynucleotides, cells or organisms. 
Generally, differences are limited so that the nucleotide sequences of the reference and the 
variant are closely similar overall and, in many regions, identical. 

Variants of polynucleotides according to the invention include, without being limited to, 
nucleotide sequences which are at least 95% identical to a polynucleotide selected from the 
group consisting of the nucleotide sequences SEQ ID Nos. 1 to 26, 36 to 40 and 54 to 229 or to 
any polynucleotide fragment of at least 8 consecutive nucleotides of a polynucleotide selected 
from the group consisting of the nucleotide SEQ ID Nos. 1 to 26, 36 to 40 and 54 to 229, and 
preferably at least 99% identical, more particularly at least 99.5% identical, and most preferably 
at least 99.8% identical to a polynucleotide selected from the group consisting of the nucleotide 
SEQ ID Nos. 1 to 26, 36 to 40 and 54 to 229 or to any polynucleotide fragment of at least 30, 
35, 40, 50, 70, 80, 100, 250, 500 , 1000 or 2000, to the extent that the length is consistent with 
the particular sequence ID, consecutive nucleotides of a polynucleotide selected from the group 
consisting of the nucleotide sequences of SEQ ID Nos. 1 to 26, 36 to 40 and 54 to 229. 

Nucleotide changes present in a variant polynucleotide may be silent, which means that 
they do not alter the amino acids encoded by the polynucleotide. However, nucleotide changes 
may also result in amino acid substitutions, additions, deletions, fusions and truncations in the 
polypeptide encoded by the reference sequence. The substitutions, deletions or additions may 
involve one or more nucleotides. The variants may be altered in coding or non-coding regions 
or both. Alterations in the coding regions may produce conservative or non-conservative amino 
acid substitutions, deletions or additions. 

A polynucleotide fragment is a polynucleotide having a sequence that is entirely the 
same as part but not all of a given nucleotide sequence, preferably the nucleotide sequence of an 
sbgl polynucleotide, and variants thereof, or of a polynucleotide of any of SEQ ID Nos 1 to 26, 
36 to 40 and 54 to 229, or a polynucleotide comprising one of the biallelic markers Al to A360 
or polymorphism A361 to A489, or the complements thereof. Such fragments may be "free- 
standing", i.e. not part of or fused to other polynucleotides, or they may be comprised within a 
single larger polynucleotide of which they form a part or region. Indeed, several of these 
fragments may be present within a single larger polynucleotide. Optionally, such fragments 
may comprise, consist of, or consist essentially of a contiguous span of at least 8, 10, 12, 15, 18, 
20, 25, 30, 35, 40, 50, 70, 80, 100, 250, 500 , 1000 or 2000 nucleotides in length of any of SEQ 
ID Nos 1 to 26, 36 to 40 and 54 to 229. 

Identity Between Nucleic Acids Or Polypeptides 
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The terms " percentage of sequence identity " and " percentage homology " are used 
interchangeably herein to refer to comparisons among polynucleotides and polypeptides, and 
are determined by comparing two optimally aligned sequences over a comparison window, 
wherein the portion of the polynucleotide or polypeptide sequence in the comparison window 
5 may comprise additions or deletions (i.e., gaps) as compared to the reference sequence (which 

does not comprise additions or deletions) for optimal alignment of the two sequences. The 
percentage is calculated by determining the number of positions at which the identical nucleic 
acid base or amino acid residue occurs in both sequences to yield the number of matched 
positions, dividing the number of matched positions by the total number of positions in the 

10 window of comparison and multiplying the result by 100 to yield the percentage of sequence 

identity. Homology is evaluated using any of the variety of sequence comparison algorithms 
and programs known in the art. Such algorithms and programs include, but are by no means 
limited to, TBLASTN, BLASTP, FASTA, TFASTA, and CLUSTALW (Pearson and Lipman, 
1988, Proc. Natl. Acad. Sci. USA 85(8):2444-2448; Altschul et al., 1990, J. Mol. Biol. 

15 21 5(3):403-410; Thompson et ah, 1994, Nucleic Acids Res. 22(2):4673-4680; Higgins et al., 

1996, Methods Enzymol. 266:383-402; Altschul et ah, 1990, J. Mol. Biol. 215(3):403-410; 
Altschul et ah, 1993, Nature Genetics 3:266-272). In a particularly preferred embodiment, 
protein and nucleic acid sequence homologies are evaluated using the Basic Local Alignment 
Search Tool ("BLAST") which is well known in the art (see, e.g., Karlin and Altschul, 1990, 

20 Proc. Natl. Acad. Sci. USA 87:2267-2268; Altschul et al., 1990, J. Mol. Biol. 215:403-410; 

Altschul et al., 1993, Nature Genetics 3:266-272; Altschul et al., 1997, Nuc. Acids Res. 
25:3389-3402). In particular, five specific BLAST programs are used to perform the following 
task: 

(1 ) BLASTP and BLAST3 compare an amino acid query sequence against a protein 
25 sequence database; 

(2) BLASTN compares a nucleotide query sequence against a nucleotide sequence 
database; 

(3) BLASTX compares the six-frame conceptual translation products of a query 
nucleotide sequence (both strands) against a protein sequence database; 

30 (4) TBLASTN compares a query protein sequence against a nucleotide sequence 

database translated in all six reading frames (both strands); and 

(5) TBLASTX compares the six-frame translations of a nucleotide query sequence 
against the six-frame translations of a nucleotide sequence database. 

The BLAST programs identify homologous sequences by identifying similar segments, 
35 which are referred to herein as "high-scoring segment pairs," between a query amino or nucleic 
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acid sequence and a test sequence which is preferably obtained from a protein or nucleic acid 
sequence database. High-scoring segment pairs are preferably identified (i.e., aligned) by 
means of a scoring matrix, many of which are known in the art. Preferably, the scoring matrix 
used is the BLOSUM62 matrix (Gonnet et al., 1992, Science 256:1443-1445; Henikoff and 
5 Henikoff, 1993, Proteins 17:49-61). Less preferably, the PAM or PAM250 matrices may also 

be used (see, e.g., Schwartz and Dayhoff, eds., 1978, Matrices for Detecting Distance 
Relationships: Atlas of Protein Sequence and Structure. Washington: National Biomedical 
Research Foundation). The BLAST programs evaluate the statistical significance of all high- 
scoring segment pairs identified, and preferably selects those segments which satisfy a user- 
10 specified threshold of significance, such as a user-specified percent homology. Preferably, the 

statistical significance of a high-scoring segment pair is evaluated using the statistical 
significance formula of Karlin (see, e.g., Karlin and Altschul, 1990, Proc. Natl. Acad. Sci. USA 
87:2267-2268). 

The BLAST programs may be used with the default parameters or with modified 
1 5 parameters provided by the user. 

Stringent Hybridization Conditions 

By way of example and not limitation, procedures using conditions of high stringency 
are as follows: Prehybridization of filters containing DNA is carried out for 8 h to overnight at 
65°C in buffer composed of 6X SSC, 50 mM Tris-HCI (pH 7.5), 1 mM EDTA, 0.02% PVP, 

20 0.02% Ficoll, 0.02% BSA, and 500 \xg/ml denatured salmon sperm DNA. Filters are hybridized 

for 48 h at 65°C, the preferred hybridization temperature, in prehybridization mixture 
containing 100 ng/ml denatured salmon sperm DNA and 5-20 X 10 6 cpm of 32 P-labeled probe. 
Subsequently, filter washes can be done at 37°C for 1 h in a solution containing 2 x SSC, 0.01% 
PVP, 0.01% Ficoll, and 0.01% BSA, followed by a wash in 0.1 X SSC at 50°C for 45 min. 

25 Following the wash steps, the hybridized probes are detectable by autoradiography. Other 

conditions of high stringency which may be used are well known in the art and as cited in 
Sambrooketal., 1989; and Ausubel et al., 1989. These hybridization conditions are suitable 
for a nucleic acid molecule of about 20 nucleotides in length. There is no need to say that the 
hybridization conditions described above are to be adapted according to the length of the desired 

30 nucleic acid, following techniques well known to the one skilled in the art. The suitable 

hybridization conditions may for example be adapted according to the teachings disclosed in the 
book of Hames and Higgins (1985) or in Sambrook et al.(1989). 

Genomic Sequences of the poly nucleotides of the invention 
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The present invention concerns genomic DNA sequences of the sbgl, g34665, sbg2, 
g35017 and g35018 genes, as well as DNA sequences of the human chromosome 13q31-q33 
region, and more particularly, a subregion thereof referred to herein as region D. 

As referred to herein, genomic sequences of sbg2, g35017 and g3501 8 are indicated by 
5 nucleotide position in the 5' to 3 1 orientation on SEQ ID No 1 . sbgl and g34665 are transcribed 

in the opposite direction, ie. from the nucleic acid strand complementary to SEQ ID No 1 . 
Genomic sequences of sbg l and g34665 are thus indicated by nucleotide position in the 3' to 5' 
orientation on SEQ ID No 1 . 

Preferred nucleic acids of the invention include isolated, purified, or recombinant 

10 polynucleotides comprising a contiguous span of at least 12, 15, 1 8, 20, 25, 30, 35, 40, 50, 60, 

70, 80, 90, 100, 150, 200, 500, 1000 or 2000 nucleotides of nucleotide positions 31 to 29265 1 
and 292844 to 3 19608 of SEQ ID No. 1 , or the complements thereof. Further nucleic acids of 
the invention include isolated, purified, or recombinant polynucleotides comprising a 
contiguous span of at least 12, 15, 18, 20, 25, 30, 35, 40, 50, 60, 70, 80, 90, 100, 150, 200, 500, 

15 1 000 or 2000 nucleotides, to the extent that the length of said span is consistent with the length 

of the SEQ ID, of SEQ ID Nos. 112to229. Optionally, said span is at least 12, 15, 18,20,25, 
30, 35, 40, 50, 60, 70, 80, 90, 100, 150, 200, 500, 1000 or 2000 nucleotides of SEQ ID Nos. 1 12 
to 114, 115 to 117, 119, 121, 125 to 145, 147 to 150, 159 to 170, and 176 to 229. 

The invention also encompasses a purified, isolated, or recombinant polynucleotide 

20 comprising a nucleotide sequence having at least 70, 75, 80, 85, 90, or 95% nucleotide identity 

with a nucleotide sequence of of nucleotide positions 31 to 292651 and 292844 to 319608 of 
SEQ ID No. 1, or a complementary sequence thereto or a fragment thereof. Another object of 
the invention consists of a purified, isolated, or recombinant nucleic acid that hybridizes with a 
contiguous span of at least 12, 15, 18, 20, 25, 30, 35, 40, 50, 60, 70, 80, 90, 100, 150, 200, 500, 

25 1 000 or 2000 nucleotides of SEQ ID No 1 or a complementary sequence thereto or a variant 

thereof, under the stringent hybridization conditions as defined above. 

Additional preferred nucleic acids of the invention include isolated, purified, or 
recombinant polynucleotides comprising a contiguous span of at least 12, 15, 18, 20, 25, 30, 35, 
40, 45, 50, 55, 60, 65, 70, 75, 80, 90, 1 00 or 200 nucleotides of SEQ ID No 1 or the 

30 complements thereof, wherein said contiguous span comprises a biallelic marker. Optionally, 

said contiguous span comprises ar biallelic marker selected from the group consisting of A 1 to 
A69, A7 1 to A74, A76 to A94, A96 to A 1 06, A 1 08 to A 1 1 2, A 1 1 4 to A 1 77, A 1 79 to A 1 97, 
A199 to A222, A224 to A242. Optionally allele 2 is present at the biallelic marker. It should be 
noted that nucleic acid fragments of any size and sequence may be comprised by the 

35 polynucleotides described in this section. 
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Another particularly preferred set of nucleic acids of the invention include isolated, 
purified, or recombinant polynucleotides comprising a contiguous span of at least 12, 15, 18, 20, 
25,30,35,40, 50, 60, 70, 80, 90, 100, 150, 200, 500, 1000 or 2000 nucleotides, to the extent 
that such a length is consistent with the lengths of the particular nucleotide position, of SEQ ID 
5 No. 1 or the complements thereof, wherein said contiguous span comprises at least 1, 2, 3, 5, or 

10 nucleotide positions of any one of the following ranges of nucleotide positions, designated 
posl to pos!66 5 of SEQ TDNo. 1 listed in Table 1 below: 



Table 1 



Position 


Position in SEQ ID No 1 


Position 


Position in SEQ ID No 1 


Repininp 


End 


t) <=» rr i ri inn 
DCgllllllH 


una 


pos 1 


36 


2000 


pos 84 


1 JUUU 1 


i ^Rnnn 


pos 2 


2001 


4000 


pos 85 


1 SK001 


1 OUuUU 


pos 3 


4001 


6000 


pos 86 




1 *v?nnfi 


pos 4 


6001 


8000 


pos 87 


162001 


164000 ! 


pos 5 


8001 


10000 


pos 88 


164001 


166000 


pos 6 


10001 


12000 


pos 89 


166001 


168000 


pos 7 


12001 


14000 


pos 90 


168001 


170000 


pos 8 


14001 


16000 


pos 91 


170001 


172000 


pos 9 


16001 


18000 


pos 92 


172001 


174000 


pos 10 


18001 


20000 


pos 93 


174001 


176000 


pos 1 1 


20001 


22000 


pos 94 


176001 


178000 


pos 12 


22001 


24000 


pos 95 


178001 


180000 


pos 13 


24001 


26000 


pos 96 


180001 


182000 


pos 14 


26001 


28000 


pos 97 


182001 


184000 


pos 15 


28001 


29966 


pos 98 


184001 


186000 


pos 16 


30116 


32000 


pos 99 


186001 


188000 


pos 17 


32001 


34000 


pos 100 


188001 


190000 


pos 18 


34001 


36000 


pos 101 


190001 


192000 


pos 19 


36001 


38000 


pos 102 


192001 


194000 


pos 20 


38001 


40000 


pos 103 


194001 


196000 


pos 21 


40001 


42000 


pos 104 


196001 


198000 


pos 22 


42001 


44000 


pos 105 


198001 


200000 


pos 23 


44001 


46000 


pos 106 


200001 


201000 
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Position 


Position in 


SEQ ID No 1 


Position 


Position in SEQ ID No 1 


Begining 


End 




Begining 


End 


pos 24 


46001 


48000 


pos 107 


201001 


202000 


pos 25 


48001 


50000 


dos 108 


202001 


204000 


pos 26 


50001 


52000 


dos 1 09 


204001 


206000 


pos 27 


52001 


54000 


noq 110 


206001 


208000 


pos 28 


54001 


56000 


nnc 111 

1 |JUO 111 


208001 


210000 


pos 29 


56001 


58000 


pos 1 iz 


210001 


212000 


pos 30 


58001 


60000 


pOS 1 1 3 


212001 


214000 


pos 31 


60001 


62000 


W% y"V 11/1 

pOS 114 


214001 


216000 


pos 32 


62001 


64000 


_ lie 

pos 1 15 


216001 


218000 


pos 33 


64001 


66000 


pos 1 I o 


218001 


220000 


pos 34 


66001 


68000 


pos 1 1 7 


220001 


222000 


pos 35 


68001 


70000 


pos 1 1 8 


222001 


224000 


pos 36 


70001 


72000 


pos 1 1 9 


224001 


226000 


pos 37 


72001 


74000 ~* 


pos 120 


226001 


228000 


pos 38 


74001 


76000 


pOS 121 


228001 


230000 i 


pos 39 


76001 


78000 


pos 122 


230001 


232000 


pos 40 


78001 


80000 

uvvvv 


pos J 23 


232001 


234000 


pos 41 


80001 


82000 


pOS 1Z4 


234001 


236000 


pos 42 


82001 


84000 


pos 1 Zj 


236001 


238000 


pos 43 


84001 


86000 


pos 1 ZO j 


238001 


240000 


pos 44 


86001 


88000 i 


r*rvc 10*7 
pus 1 Z / 


240001 


242000 


pos 45 


88001 


90000 




242001 


244000 


pos 46 


90001 


92000 


pos 129 


244001 


246000 


pos 47 


92001 


94000 


DOS 1 30 


246001 


248000 


pos 48 


94001 


96000 


pos 131 


248001 


250000 


pos 49 


96001 


98000 


pos 132 


250001 


ZjzUUU 


pos 50 


98001 


100000 


pos 133 


252001 1 


254000 


pos 51 


10000 


102000 


pos 134 


254001 


256000 


pos 52 


10200 


104000 


pos 135 


256001 


258000 


pos 53 


10400 


106000 


pos 136 


258001 


260000 


pos 54 


10600 


108000 


pos 137 


260001 


262000 
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Position 


Position in SEQ ID No 1 


Position 


Position in SEQ ID No 1 




Begining 


End 




Begining 


End 


pos 55 


10800 


1 10000 


pos 138 


262001 


264000 


pos 56 


1 1000 


1 02000 


pos 139 


264001 


266000 


pos 57 


10200 


104000 


pos 140 


266001 


268000 


pos 58 


10400 


106000 


pos 141 


268001 


970000 


pos 59 


1 0600 


108000 


pos 142 


O 70001 


970000 


pos 60 


10800 


110000 


pos 143 


979001 


97AOOO 


pos 61 


1 1000 


i i 9000 


pos 144 


97A001 


o*7^nnn 
2/O000 


pos 62 


1 1 900 


1 iztnnn 


pos 145 


2 /OOOl 


<-» TOAAA 

278000 


pos 63 


i 1400 


i i £noo 

1 IOUUU 


dos 146 


2 /oOOl 


280000 


pos 64 


1 1 £00 


1 1 5UUU 


pos 147 


280001 


282000 


pos 65 


1 1 sioo 

1 1 oUU 


IzOUOO j 


dos 148 


282001 


284000 


pos 66 


1 onnn 

I ZUUU 


122000 


pos 1 49 


284001 


286000 


pos 67 


1 zzuu 


i o/innn 

I Z*fUUO 


dos 150 


286001 


^ OOAAA . 

288000 


pos 68 


1 9aoo 


1ZOUUU 


DOS 151 


288001 


290000 


pos 69 


i 9aoo 


i ORnnn 

1 ZoUUU 


dos 152 


29000 i 


292000 


pos 70 




UUUUU 


dos 153 


292001 


294000 . 


pos 71 


i ^oon 


1 jZUUU 


pos 1 54 




O A^AAA 

296000 


pos 72 


1 3900 




pos 155 


29000 1 


TAOAAA 

298000 


pos 73 


13400 


i 3*>ooo 


pos 1 56 




J0O000 


pos 74 


13600 


138000 


pos 157 


^oooni 


jUZUUO 


pos 75 


13800 


140000 

I ~ww 


pos 158 


109001 


304000 


pos 76 


14000 


142000 


pos 159 


304001 

_/\/*TW 1 


306000 


pos 77 


14200 


1 44000 


pos 160 


306001 


308000 


pos 78 


14400 


146000 


pos 161 


308001 


310000 


pos 79 


14600 


148000 


pos 162 


310001 


312000 


pos 80 


148000 


150000 


pos 163 


312001 


314000 


pos 81 


150001 


152000 


pos 164 


314001 


316000 


pos 82 


152001 


154000 


pos 165 


316001 


318000 


pos 83 


154001 


156000 


pos 166 


318001 


319608 
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The present invention encompasses the g34665, g34673, g34667, g35017 and g35018 
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genes and nucleotide sequences. 
g34665 

In one aspect, the invention concerns g34665 genomic sequences consisting of, 
consisting essentially of, or comprising the sequence of nucleotide positions 292653 to 296047 
of SEQ ID No 1, a sequence complementary thereto, as well as fragments and variants thereof. 
These polynucleotides may be purified, isolated, or recombinant. 

Particularly preferred nucleic acids of the invention include isolated, purified, or 
recombinant polynucleotides comprising a contiguous span of at least 12, 15, 1 8, 20, 25, 30, 35, 
40, 50, 60, 70, 80, 90, 100, 150 or 200 nucleotides, to the extent that the length of said spanis 
consistent with the nucleotide position range, of nucleotide positions 292653 to 292841, 295555 
to 296047 or 295580to 296047 ofSEQ ID No 1 . Further preferred nucleic acids of the 
invention include isolated, purified, or recombinant polynucleotides comprising a contiguous 
span of at least 12, 15, 18, 20, 25, 30, 35, 40, 50, 60, 70, 80, 90, 100, 150 or 200 nucleotides, to 
the extent that the length of said span is consistent with the nucleotide position range, of 
nucleotide positions 292653 to 292841, 295555 to 296047, or 295580 to 296047 of SEQ ID No 
1, or the complements thereof, wherein said contiguous span comprises a g34665 -related 
biallehc marker. Optionally, said biallelic marker is selected from the group consisting of A230 
to A236. It should be noted that nucleic acid fragments of any size and sequence may also be 
comprised by the polynucleotides described in this section. 

The invention also encompasses a purified, isolated, or recombinant polynucleotide 
comprising a nucleotide sequence having at least 70, 75, 80, 85, 90, 95, 97, 98 or 99% 
nucleotide identity with a nucleotide sequence of of nucleotide positions 290653 to 292652, 
292653 to 296047, 292653 to 292841, 295555 to 296047, 295580 to 296047 and 296048 to' 
298048 of SEQ ID No 1 or a complementary sequence thereto or a fragment thereof. The 
nucleotide differences as regards to nucleotide positions 290652 to 292652, 292653 to 296047, 
292653 to 292841, 295555 to 296047, 295580 to 296047 and 296048 to 298048 of SEQ ID No 
1 may be generally randomly distributed throughout the entire nucleic acid. Nevertheless, 
preferred nucleic acids are those wherein the nucleotide differences as regards to the nucleotide 
sequence of SEQ ID No 1 are predominantly located outside the coding sequences contained in 
the exons. These nucleic acids, as well as their fragments and variants, may be used as 
oligonucleotide primers or probes in order to detect the presence of a copy of the g34665 gene 
in a test sample, or alternatively in order to amplify a target nucleotide sequence within the 
g34665 sequences. 

Another object of the invention consists of a purified, isolated, or recombinant nucleic 
acid that hybridizes with a g34665 nucleotide sequence of any of nucleotide positions 292653 to 
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296047, 292653 to 292841 , 295555 to 296047, 295980 to 296047 and 296048 to 298048 SEQ 
ID No 1 or a complementary sequence thereto or a variant thereof, under the stringent 
hybridization conditions as defined above. 

The g34665 genomic nucleic acid comprises at least 3 exons. The exon positions in 
SEQ ID No 1 are detailed below in Table 2. 

Table 2 



Exon 


Position in SEQ !DNo 1 


Intron 


Position in SEQ ID No 1 


Beginning 


End 


Beginning 


End 


B 


292653 


292841 


B-Ab 


292842 


295554 


Ab 


295555 


296047 


B-A 


292842 


295979 


A 


295980 


296047 









Thus, the invention embodies purified, isolated, or recombinant polynucleotides 
comprising a nucleotide sequence selected from the group consisting of the 3 exons of the 
10 g34665 gene, or a sequence complementary thereto. The invention also deals with purified, 

isolated, or recombinant nucleic acids comprising a combination of two exons of the g34665 
gene. 

Intron B-Ab refers to the nucleotide sequence located between Exon B and Exon Ab, 
and so on. The position of the introns is detailed in Table 2. Thus, the invention embodies 
15 purified, isolated, or recombinant polynucleotides comprising a nucleotide sequence selected 

from the group consisting of the 2 introns of the g34665 gene, or a sequence complementary 
thereto. 

While this section is entitled "Genomic Sequences of g34665," it should be noted that 
nucleic acid fragments of any size and sequence may also be comprised by the polynucleotides 

20 described in this section, flanking the genomic sequences of g34665 on either side or between 

two or more such genomic sequences. 

A g34665 polynucleotide or gene may further contain regulatory sequences both in the 
non-coding S'-flanking region and in the non-coding 3 '-flanking region that border the region 
containing said genes or exons. 

25 Polynucleotides derived from 5' and 3' regulatory regions are useful in order to detect 

the presence of at least a copy of a nucleotide sequence comprising a g34665 nucleotide 
sequence of SEQ ID No. 1 or a fragment thereof in a test sample. Polynucleotides carrying the 
regulatory elements located at the 5' end and at the 3' end of the genes comprising the exons of 
the present invention may be advantageously used to control the transcriptional and translational 
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activity of a heterologous polynucleotide of interest. 

Methods for identifying the relevant polynucleotides comprising biologically active 
g34665 regulatory fragments or variants of SEQ ID No 1 are further described herein. Thus, the 
present invention also relates to a purified or isolated nucleic acid comprising a polynucleotide 
which is selected from the group consisting of the 5' and 3' regulatory regions of g34665, or a 
sequence complementary thereto or a biologically active fragment or variant thereof. 

s35017 

In one aspect, the invention concerns g35017 genomic sequences consisting of, 
consisting essentially of, or comprising the sequence of nucleotide positions 94124 to 94964 of 
SEQ ID No 1, a sequence complementary thereto, as well as fragments and variants thereof. 
These polynucleotides may be purified, isolated, or recombinant. 

The invention also encompasses a purified, isolated, or recombinant polynucleotide 
comprising a nucleotide sequence having at least 70, 75, 80, 85, 90, or 95% nucleotide identity 
with a nucleotide sequence of of nucleotide positions 94124 to 94964 SEQ ID No I or a 
complementary sequence thereto or a fragment thereof. The nucleotide differences as regards to 
nucleotide positions 94124 to 94964 SEQ ID No 1 may be generally randomly distributed 
throughout the entire nucleic acid. Nevertheless, preferred nucleic acids are those wherein the 
nucleotide differences as regards to the nucleotide sequence of SEQ ID No 1 are predominantly 
located outside the coding sequences contained in the exons. These nucleic acids, as well as 
their fragments and variants, may be used as oligonucleotide primers or probes in order to detect 
the presence of a copy of the g3501 7 gene in a test sample, or alternatively in order to amplify a 
target nucleotide sequence within the g35017 sequences. 

Another object of the invention consists of a purified, isolated, or recombinant nucleic 
acid that hybridizes with a g3501 7 nucleotide sequence of any of nucleotide positions 94124 to 
94964 of SEQ ID No 1 or a complementary sequence thereto or a variant thereof, under the 
stringent hybridization conditions as defined above. 

Particularly preferred nucleic acids of the invention include isolated, purified, or 
recombinant polynucleotides comprising a contiguous span of at least 12, 15, 18, 20, 25, 30, 35 
40, 50, 60, 70, 80, 90, 100, 150, 200 or 500 nucleotides of nucleotide position 94124 to 94964 
of SEQ ID No 1 or the complements thereof. Particularly preferred nucleic acids of the 
invention include isolated, purified, or recombinant polynucleotides comprising a contiguous 
span of at least 12, 15, 18, 20, 25, 30, 35, 40, 50, 60, 70, 80, 90, 100, 150, 200 or 500 
nucleotides of nucleotide position 94124 to 94964 of SEQ ID No 1 or the complements thereof, 
wherein said contiguous span comprises a g35017 related biallelic marker. Optionally, said 
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biallelic marker is the biallelic marker designated A41 in Table 6b. It should be noted that 
nucleic acid fragments of any size and sequence may also be comprised by the polynucleotides 
described in this section. 

While this section is entitled "Genomic Sequences of g35017," it should be noted that 
5 nucleic acid fragments of any size and sequence may also be comprised by the polynucleotides 

described in this section, flanking the genomic sequences of g3501 7 on either side or between 
two or more such genomic sequences. 

A g35017 polynucleotide or gene may further contain regulatory sequences both in the 
non-coding S'-flanking region and in the non-coding 3'-flanking region that border the region 
1 0 containing said genes or exons. 

Polynucleotides derived from g35017 5' and 3' regulatory regions are useful in order to 
detect the presence of at least a copy of a nucleotide sequence comprising an g35017 nucleotide 
sequence of SEQ ID No. 1 or a fragment thereof in a test sample. Polynucleotides carrying the 
regulatory elements located at the 5' end and at the 3' end of the genes comprising the exons of 
15 the present invention may be advantageously used to control the transcriptional and translational 

activity of a heterologous polynucleotide of interest. 

Methods for identifying the relevant polynucleotides comprising biologically active 
regulatory fragments or variants of a g3 501 7 nucleic acid sequence of SEQ ID No 1 are further 
described herein. Thus, the present invention also relates to a purified or isolated nucleic acid 
20 comprising a polynucleotide which is selected from the group consisting of the 5' and 3* 

regulatory regions, or a sequence complementary thereto or a biologically active fragment or 
variant thereof. In one aspect, the 5 f regulatory region may comprise a nucleotide sequence 

e350I8 

25 In one aspect, the invention concenrs g3501 8 genomic sequences consisting of, 

consisting essentially of, or comprising the sequence of nucleotide positions 1 108 to 65853 of 
SEQ ID No 1, a sequence complementary thereto, as well as fragments and variants thereof. 
These polynucleotides may be purified, isolated, or recombinant. 

Particularly preferred nucleic acids of the invention include isolated, purified, or 

30 recombinant polynucleotides comprising a contiguous span of at least 12, 15, 18, 20, 25, 30, 35, 

40, 50, 60, 70, 80, 90, 100, 150, 200, 500, or 1000 nucleotides, to the extent that said span is 
consistent with the nucleotide position range, of SEQ ID No 1, wherein said contiguous span 
comprises at least 1 , 2, 3, 5, or 1 0 of the following nucleotide positions of SEQ ID No 1 : 1 1 08 
to 65853, 1108 to 1289, 14877 to 14920, 18778 to 18862,25593 to 25740, 29388 to 29502, 

35 29967 to 30282, 64666 to 64812 and 65505 to 65853, or the complements thereof. Further 
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preferred nucleic acids of the invention include isolated, purified, or recombinant 
polynucleotides comprising a contiguous span of at least 12, 15, 18, 20, 25, 30, 35, 40, 50, 60, 
70, 80, 90, 100, 150, 200, 500, or 1000 nucleotides of nucleotide positions 1 108 to 65853, 1 108 
to 1289, 14877 to 14920, 18778 to 18862, 25593 to 25740, 29388 to 29502, 29967 to 30282, 
64666 to 64812 or 65505 to 65853 of SEQ ID No 1, or the complements thereof, wherein said 
contiguous span comprises a g350!8 related biallelic marker. Optionally, said biallelic marker 
is selected from the group consisting of Al to A39. It should be noted that nucleic acid 
fragments of any size and sequence may also be comprised by the polynucleotides described in 
this section. 

The invention also encompasses a purified, isolated, or recombinant polynucleotide 
comprising a nucleotide sequence having at least 70, 75, 80, 85, 90, or 95% nucleotide identity 
with a nucleotide sequence of nucleotide positions 31 to 11 07, 1 108 to 65853, 1 108 to 1289, 
14877 to 1 4920, 1 8778 to 1 8862, 25593 to 25740, 29388 to 29502, 29967 to 30282, 64666 to 
64812, 65505 to 65853 and 65854 to 67854 of SEQ ID No 1 or a complementary sequence 
thereto or a fragment thereof. The nucleotide differences as regards to nucleotide positions 3 1 
to 1 107, 1 108 to 65853, 1 108 to 1289, 14877 to 14920, 18778 to 18862, 25593 to 25740, 29388 
to 29502, 29967 to 30282, 64666 to 64812, 65505 to 65853 and 65854 to 67854 of SEQ ID No 
1 may be generally randomly distributed throughout the entire nucleic acid. Nevertheless, 
preferred nucleic acids are those wherein the nucleotide differences as regards to the nucleotide 
20 sequence of nucleotide positions 31 to 1 107, 1 108 to 65853, 1 108 to 1289, 14877 to 14920, 

18778 to 18862, 25593 to 25740, 29388 to 29502, 29967 to 30282, 64666 to 64812, 65505 to 
65853 and 65854 to 67854 of SEQ ID No 1 are predominantly located outside the coding 
sequences contained in the exons. These nucleic acids, as well as their fragments and variants, 
may be used as oligonucleotide primers or probes in order to detect the presence of a copy of the 
g3501 8 gene in a test sample, or alternatively in order to amplify a target nucleotide sequence 
within the g3 50 1 8 sequences. 

Another object of the invention consists of a purified, isolated, or recombinant nucleic 
acid that hybridizes with a g350 1 8 nucleotide sequence of any of nucleotide positions 3 1 to 
1107, 1108 to 65853, 1108 to 1289, 14877 to 14920, 18778 to 18862, 25593 to 25740, 29388 to 
29502, 29967 to 30282, 64666 to 64812, 65505 to 65853 and 65854 to 67854 SEQ ID No 1, or 
a complementary sequence thereto or a variant thereof, under the stringent hybridization 
conditions as defined above. 

Yet further nucleic acids of the invention include isolated, purified, or recombinant 
polynucleotides comprising a contiguous span of at least 12, 15, 18, 20, 25, 30, 35, 40, 50, 60, 
35 70, 80, 90, 100, 150, 200 or 500 nucleotides, to the extent that said span is consistent with the 
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nucleotide position range, of SEQ ID No 1, wherein said contiguous span comprises at least I, 
2, 3, 5, or 10 of the following nucleotide positions of SEQ ID No 1: 1255 to 1289, 29967 to 
301 15, 30225 to 30282, or the complements thereof, as well as polynucleotides having at least 
70, 75, 80, 85, 90, or 95% nucleotide identity with said span and polynucleotides capable of 
hybridizing with said span. 

The g35018 genomic nucleic acid comprises at least 8 exons. The exon positions in 
SEQ ID No I are detailed below in Table 3. 



Table 3 



Exon 


Position in SEQ ID No 1 


Intron 


Position in SEQ ID No 1 


Beginning 


End 


Beginning 


End 


A 


1108 


1289 


A 


1290 


14876 


B 


14877 


14920 


B 


14921 


18777 


Bbis 


18778 


18862 


Bbis 


18863 


25592 


C 


25593 


25740 


C 


25741 


29387 


D 


29388 


29502 ! 


D 


29503 


29966 


E 


29967 


30282 


E 


30283 


64665 


F 


64666 


64812 


F 


64813 


65504 


G 


65505 


65853 









Thus, the invention embodies purified, isolated, or recombinant polynucleotides 
comprising a nucleotide sequence selected from the group consisting of the 8 exons of the 
g3501 8 gene, or a sequence complementary thereto. The invention also deals with purified, 
isolated, or recombinant nucleic acids comprising a combination of at least two exons of the 
3501 8 gene, wherein the polynucleotides are arranged within the nucleic acid, from the 5'-end 
to the 3 '-end of said nucleic acid, in the same order as in SEQ ID No 1 . 

Intron 1 refers to the nucleotide sequence located between Exon 1 and Exon 2, and so 
on. The position of the introns is detailed in Table 3. Thus, the invention embodies purified, 
isolated, or recombinant polynucleotides comprising a nucleotide sequence selected from the 
group consisting of the 7 introns of the g35018 gene, or a sequence complementary thereto. 

While this section is entitled "Genomic Sequences of g3501 8," it should be noted that 
nucleic acid fragments of any size and sequence may also be comprised by the polynucleotides 
described in this section, flanking the genomic sequences of g3 501 8 on either side or between 
two or more such genomic sequences. 

A g35018 polynucleotide or gene may further contain regulatory sequences both in the 
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non-coding 5'-flanking region and in the non-coding 3'-flanking region that border the region 
containing said genes or exons. 

Polynucleotides derived from 5' and 3' regulatory regions are useful in order to detect 
the presence of at least a copy of a nucleotide sequence comprising an g35018 nucleotide 
sequence of SEQ ID No. 1 or a fragment thereof in a test sample. Polynucleotides carrying the 
regulatory elements located at the 5' end and at the 3' end of the genes comprising the exons of 
the present invention may be advantageously used to control the transcriptional and translational 
activity of a heterologous polynucleotide of interest. 

Methods for identifying the relevant polynucleotides comprising biologically active 
regulatory fragments or variants of SEQ ID No 1 are further described herein. Thus, the present 
invention also relates to a purified or isolated nucleic acid comprising a polynucleotide which is 
selected from the group consisting of the 5' and 3' regulatory regions, or a sequence 
complementary thereto or a biologically active fragment or variant thereof. 

In one embodiment, a 5' regulatory region may comprise an isolated, purified, or 
recombinant polynucleotide comprising a contiguous span of at least 1 2, 15, 18, 20, 25, 30, 35, 
40, 50, 60, 70, 80, 90, 100, 150, 200, 500, or 1000 nucleotides of nucleotide positions 31 to 
1 1 07 of SEQ ID No 1 , or the complements thereof. In one embodiment, a 3* regulatory region' 
may comprise an isolated, purified, or recombinant polynucleotide comprising a contiguous 
span of at least 12, 15, 18, 20, 25, 30, 35, 40, 50, 60, 70, 80, 90, 100, 150, 200, 500, or 1000 
nucleotides of nucleotide positions 65854 to 67854 of SEQ ID No 1 , or the complements 
thereof. 

Genomic Sequences of sbgl polynucleotides 

Particularly preferred nucleic acids of the invention include isolated, purified, or 
recombinant polynucleotides comprising, consisting essentially of, or consisting of a contiguous 
span of at least 12, 15, 1 8, 20, 25, 30, 35, 40, 50, 60, 70, 80, 90, 100, 150, 200, 500, or 1000 
nucleotides of nucleotide positions 213818 to 243685 of SEQ ID No 1, or the complements 
thereof. Also encompassed are purified, isolated, or recombinant polynucleotide comprising a 
nucleotide sequence having at least 70, 75, 80, 85, 90, or 95% nucleotide identity with 
nucleotide positions 213818 to 243685 of SEQ ID No 1, or a complementary sequence thereto 
or a fragment thereof. Nucleic acids of the invention encompass an sbgl nucleic acid from any 
source, including primate, non-human primate, mammalian and human sbgl nucleic acids. 

Further preferred nucleic acids of the invention include isolated, purified, or 
recombinant polynucleotides comprising a contiguous span of at least 12, 15, 18, 20, 25, 30, 35, 
40, 50, 60, 70, 80, 90, 1 00, 1 50, 200, 500, or 1 000 nucleotides of SEQ ID No 1 or the 
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complements thereof, wherein said contiguous span comprises an sbgl related biallelic marker. 

Optinally, said biallelic marker is selected from the group consisting of A85 to A219. 

Optinally, said biallelic marker is selected from the group consisting of A85 to A94, A96 to 

A106, A108to A112, A114to A 177, A179 to A197 and A199 to A219. 
5 It should be noted that nucleic acid fragments of any size and sequence may also be 

comprised by the polynucleotides described in this section. 

The human sbgl gene comprises exons selected from at least 22 different exons or exon 

forms, referred to herein as exons MSI, Ml, M692, M862, MS2, Ml 069, Ml 090, Ml 1 17, N , 

N2, Nbis, O, Ol, 02, Obis, P, X, Ql, Q, Qbis, Rbis and R Of these, the following exon sets 
10 contain sequence overlap and do not occur together in an mRNA: exons Ml, M692, M862, 

MS2, Ml 090 Ml 069 and Ml 1 17; exons MSI, Ml, M692 and M862; exons N and N2; exons 

Ol and 02; exons Q and Qbis; exons. R and R bis; and exons Q and Ql . 

The nucleotide positions of sbgl exons in SEQ ID No. 1 are detailed below in Table 4. 

The exon structure of the sbgl gene is further shown in Figure 1 . 
15 Table 4 



Exon 


Position in SEQ ID No 1 


Beginning 


End 


R 


215819 


215941 


Rbis 


215819 


215975 


Qbis 


216661 


216952 


Q 


216661 


217061 


! Q1 


217027 


217061 


X 


229647 


229742 


p 


230408 


230721 


Obis 


231272 


231412 


02 


231787 


231880 


Ol 


23 1 870 


231879 


O 


234174 


234321 


Nbis 


237406 


237428 


N2 


239719 


239807 


N 


239719 


239853 


M117 


240528 


240569 


Ml 090 


240528 


240596 
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Ml 069 


240528 


240617 


MS2 


240528 




M862 


240528 


240824 


M692 


240528 


240994 


Ml 


240528 


241685 


MSI 


240800 


240993 

_ 



Thus, the invention embodies purified, isolated, or recombinant polynucleotides 
comprising a nucleotide sequence selected from the group consisting of the exons of the sbgl 
gene, or a sequence complementary thereto. Preferred are purified, isolated, or recombinant 
polynucleotides comprising at least one exon having the nucleotide position ranges listed in 
Table 4 selected from the group consisting of the exons MSI, Ml, M692, M862, MS2 M1069 
M1090, M 1 1 1 7, N , N2, Nbis, O, Ol , 02, Obis, P, X, Ql, Q, Qbis , R and Rbis of me ' sbgJ ^ 
or a complementary sequence thereto or a fragment or a variant thereof. Also encompassed by 
the invention are purified, isolated, or recombinant nucleic acids comprising a combination of at 
least two exons of the sbgl gene selected from the group consisting of exons MSI, Ml M692 
M862, MS2, M1069, M1090, Ml 1 17, N . N2, Nbis, O, Ol, 02, Obis, P, X, Ql, Q, Qbis, R and 
Rb.s, wherein the polynucleotides are arranged within the nucleic acid in the same relative order 
as in SEQIDNo. 1. 

Particularly preferred nucleic acids of the invention include isolated, purified, or 
recombinant polynucleotides comprising a contiguous span of at least 12, 15, 18, 20, 25, 30 35 
40, 45, 50, 55, 60, 65, 70, 75, 80, 90, 1 00 or 200 nucleotides of SEQ ID No 1 , wherein said 
contiguous span comprises at least 1, 2, 3, 5, or 10 of the following nucleotide positions of SEQ 

IDNol:213818to215818,215819to215941,215819to215975,216661to216952 216661 
to 217061, 217027 to 217061, 229647 to 229742, 230408 to 230721, 231272 to 231412 231787 
to 231880, 231870to 231879, 234174 to 234321, 237406 to 237428, 239719 to 239807 239719 
to 239853, 240528 to 240569, 240528 to 240596, 240528 to 240617, 240528 to 240644 240528 
to 240824, 240528 to 240994, 240528 to 24 1 685, 240800 to 240993 and 241686 to 243685, or 
the complements thereof. 

Another object of the invention consists of a purified, isolated, or recombinant nucleic 
acid that hybridizes with an sbgl nucleotide sequence of nucleotide positions 213818 to 243685 

213818to215818,215819to215941,215819to215975,216661 to216952,216661to21706l' 
217027 to 217061, 229647 to 229742, 230408 to 230721, 23 1272 to 231412, 231787 to 231880 
231870 to 231879, 234174 to 234321, 237406 to 237428, 239719 to 239807, 239719 to 23985 3 ' 
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240528 to 240569, 240528 to 240596, 240528 to 240617, 240528 to 240644, 240528 to 240824, 
240528 to 240994, 240528 to 24 1 685, 240800 to 240993 or 24 1 686 to 243685 of SEQ ID No 1 , 
or a complementary sequence thereto or a variant thereof, under the stringent hybridization 
conditions as defined above. 

The present invention further embodies purified, isolated, or recombinant 
polynucleotides comprising a nucleotide sequence selected from the group consisting of the 
introns of the sbgl gene, or a sequence complementary thereto. 

In other embodiments, the present invention encompasses the sbgl gene as well as sbgl 
genomic sequences consisting of, consisting essentially of, or comprising the sequence of 
nucleotide positions 215819 to 241685 of SEQ ID No 1, a sequence complementary thereto, as 
well as fragments and variants thereof. 

The invention also encompasses a purified, isolated, or recombinant polynucleotide 
comprising a nucleotide sequence of sbgl having at least 70, 75, 80, 85, 90, or 95% nucleotide 
identity with a sequence selected from the group consisting of nucleotide positions 213818 to 
215818, 215819 to 215941, 215819 to 215975, 216661 to 216952, 216661 to 217061, 217027 to 
217061, 229647 to 229742, 230408 to 230721, 231272 to 231412, 231787 to 231880, 231870 to 
231879, 234174 to 234321, 237406 to 237428, 239719 to 239807, 239719 to 239853, 240528 to 
240569, 240528 to 240596, 240528 to 240617, 240528 to 240644, 240528 to 240824, 240528 to 
240994, 240528 to 241685, 240800 to 240993 and 241686 to 243685 of SEQ ID No. 1 or a 
complementary sequence thereto or a fragment thereof. The nucleotide differences as regards 
the nucleotide positions 213818 to 215818, 215819 to 215941, 215819 to 215975, 216661 to 
216952, 216661 to 217061, 217027 to 217061, 229647 to 229742, 230408 to 230721, 231272 to 
231412, 23 1787 to 231880, 231870 to 231879, 234174 to 234321, 237406 to 237428, 239719 to 
239807, 239719 to 239853, 240528 to 240569, 240528 to 240596, 240528 to 240617, 240528 to 
240644, 240528 to 240824, 240528 to 240994, 240528 to 241685, 240800 to 240993 and 
241686 to 243685 of SEQ ID No. 1 may generally be distributed throughout the nucleic acid. 

These nucleic acids, as well as their fragments and variants, may be used as 
oligonucleotide primers or probes in order to detect the presence of a copy of a gene comprising 
an sbgl nucleic acid sequence in a test sample, or alternatively in order to amplify a target 
nucleotide sequence within an sbgl nucleic acid sequence or adjoining region. 

Additional preferred nucleic acids of the invention include isolated, purified, or 
recombinant sbgl polynucleotides comprising a contiguous span of at least 12, 15, 18, 20, 25, 
30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 90, 100 or 200 nucleotides of nucleotide positions 
213818 to 215818, 215819 to 215941, 215819 to 215975, 216661 to 216952, 216661 to 217061, 
2 1 7027 to 2 1 706 1 , 229647 to 229742, 230408 to 23072 1 , 23 1 272 to 23 1 4 1 2, 23 1 787 to 23 1 880, 
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23 1870 to 23 1 879, 2341 74 to 234321, 237406 to 237428, 239719 to 239807, 239719 to 239853, 
240528 to 240569, 240528 to 240596, 240528 to 24061 7, 240528 to 240644, 240528 to 240824,' 
240528 to 240994, 240528 to 241685, 240800 to 240993, 215819 to 241685 and 241686 to 
243685 of SEQ ID No 1, or the complements thereof, wherein said contiguous span comprises 
at least one biallelic marker. Optionally, said contiguous span comprises an sbgl-related 
biallelic marker. It should be noted that nucleic acid fragments of any size and sequence may 
also be comprised by the polynucleotides described in this section. Either the original or the 
alternative allele may be present at said biallelic marker. 

Yet further nucleic acids of the invention include isolated, purified, or recombinant 
polynucleotides comprising a contiguous span of at least 12, 15, 18, 20, 25, 30, 35, 40, 50, 60, 
70, 80, 90, 100, 150, 200 or 500 nucleotides, to the extent that said span is consistent with the 
nucleotide position range, of SEQ ID No 1, wherein said contiguous span comprises at least 1, 
2, 3, 5, or 10 of the following nucleotide positions of SEQ ID No 1: 215820 to 215941, 216661 
to 217009, 230409 to 290721, 23 1272 to 23 141 1, 234202 to 234321, 240528 to 240567, 240528 
to 240827 and 240528 to 240996, or the complements thereof, as well as polynucleotides 
having at least 70, 75, 80, 85, 90, or 95% nucleotide identity with said span, and 
polynucleotides capable of hybridizing with said span. 

The present invention also comprises a purified or isolated nucleic acid encoding an 
sbgl protein having the amino acid sequence of any one of SEQ ID Nos 27 to 35 or a peptide 
fragment or variant thereof. 

While this section is entitled "Genomic Sequences of sbgl," it should be noted that 
nucleic acid fragments of any size and sequence may also be comprised by the polynucleotides 
described in this section, flanking the genomic sequences sbgl on either side or between two or 
more such genomic sequences. 

Sbgl cDNA Sequences 

The expression of the sbgl gene has been shown to lead to the production of several • 
mRNA species. Several cDNA sequences corresponding to these mRNA are set forth in SEQ 
ID Nos 2 to 26. 

The invention encompasses a purified, isolated, or recombinant nucleic acid comprising 
a nucleotide sequence selected from the group consisting of SEQ ID Nos 2 to 26, 
complementary sequences thereto, splice variants thereof, as well as allelic variants, and 
fragments thereof. Moreover, preferred polynucleotides of the invention include purified, 
isolated, or recombinant sbgl cDNAs consisting of, consisting essentially of, or comprising a 
nucleotide sequence selected from the group consisting of SEQ ID Nos 2 to 26. Particularly 
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preferred nucleic acids of the invention include isolated, purified, or recombinant 
polynucleotides comprising a contiguous span of at least 8, 12, 15, 18, 20, 25, 30, 35, 40, 50, 60, 
70, 75, 80, 100, 200 or 500 nucleotides, to the extent that the length of said contiguous span is 
consistent with the length of the SEQ ID, of a nucleotide sequence selected from the group 
5 consisting of SEQ ID Nos 2 to 26, or the complements thereof. 

It should be noted that nucleic acid fragments of any size and sequence may also be 
comprised by the polynucleotides described in this section. 

The invention also pertains to a purified or isolated nucleic acid comprising a 
polynucleotide having at least 70, 80, 85, 90 or 95% nucleotide identity with a polynucleotide 
10 selected from the group consisting of SEQ ID Nos 2 to 26, advantageously 99% nucleotide 

identity, preferably 99.5% nucleotide identity and most preferably 99.8% nucleotide identity 
with a polynucleotide selected from the group consisting of SEQ ID Nos 2 to 26, or a sequence 
complementary thereto or a biologically active fragment thereof. 

Another object of the invention relates to purified, isolated or recombinant nucleic acids 
1 5 comprising a polynucleotide that hybridizes, under the stringent hybridization conditions 

defined herein, with a polynucleotide selected from the group consisting of SEQ ID Nos 2 to 26, 
or a sequence complementary thereto or a variant thereof or a biologically active fragment 
thereof. 

The sbgl cDNA forms of SEQ ID Nos 2 to 26 are further described in Table 5a below. 
20 Shown on the Table 5a are the positions of the 5* UTR, the open reading frame (ORF), the 3 1 

UTR and the poly A signal on the respective SEQ ID No. Also shown are the sbgl exons 
comprising the cDNA form of a particular SEQ ID No. 



Table 5a 



SEQ 
ID No 


cDNA 


Pos range of 
5UTR 


Pos range of 
ORF 


Pos rai 

31T1 


age of 

rR 


Pos range of 
polyA signal 


2 


M862NOQbisR 




253 


254 


304 


305 


995 


971 


976 


3 


M862NOObisP 




253 


254 


304 


305 


1035 


1020 


1025 


4 


Ml 




187 


188 


520 


521 


1158 






1 "5 


M862NOP 




253 


254 


304 


305 


894 


879 


884 


6 


M1090NOXQbisR 




25 


26 


76 


77 


863 


839 


844 


7 


M1117N2001P 






2 


310 


311 


603 


588 


593 


8 


M1117N20P 






2 


358 


359 


593 


578 


583 


9 


M1U7N001P 






2 


49 


50 


649 


634 


639 


10 


M1117N002P 






2 


49 


50 


733 


718 


723 


11 


MSlMS2NOQbisR 


1 


267 


268 


318 


319 


1009 


985 


990 
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12 


M1069NOQR 


1 


46 


47 


Q7 

y i 


no 

70 


o97 


873 


878 


13 


M1069N2OQ1 QbisR 


1 


46 


47 


343 


344 


777 


753 


758 


14 


M1069NOQ1 QbisR 


1 


46 


47 


07 

y / 


no 
98 


823 


799 


804 


15 


M1069N2OO2QbisR 


1 


46 


47 


477 


428 


836 


812 


817 


16 


M1069NOO2QbisR 


1 


HO 


47 
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Primers used to isolate the particular sbgl cDNAs listed above from RNA from various 
tissues are provided below in Table 5b. Primers designed to hybridize to nucleic acid sequences 
of exons MSI, M862, Ml 090, Ml 1 17 and MS2, and exons P and R resulted in the cloning of 
multiple cDNA forms for several sets of primers. The primers used are listed in SEQ ID Nos 44 
ot53. 

mRNA forms of sbgl were found to differ among tissues; Table 5c lists cDNA forms 
cloned from various tissues and the relative percentages and numbers of clones found per tissue 
for each listed sbgl mRNA form. 

The present inventors have also identified further variations in cDNA sequence as 
obtained from various tissues and compared with the consensus sbgl genomic nucleotide 
sequence. The tissues from which cDNA was cloned were obtained from pooled individuals 
numbering from 1 1 to 60. Table 5d below describes the identities of variants, the nucleotide 
position of the variation in nucleotide sequence of SEQ ID No 2, and the number of samples 
having the specified sequence for each respective nucleotide position on the sbgl cDNA 
sequence of SEQ ID No. 2. Also indicated in Table 5d are amino acid changes in the 
corresponding sbgl polypeptide sequence (described herein), if any, resulting from the 
nucleotide sequence variations in the cDNA of SEQ ID No 2. 
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These variants may represent rare polymorphisms or may be the result of tissue-specific 
RNA editing. Alternatively, some variations may be the result of the presence in the human 
genome of one or more sbgl -related genes or a small family of sbgl -related genes with strict 
tissue specificity of expression and small variation in gene structure. The latter hypothesis was 
tested by applicants for the case where the exon-intron structure of these genes are identical, 
demonstrating that variations in at least exons M and N are not the result of the presence of 
related genes. 

The present invention thus further encompasses variant sbgl polynucleotides having at 
least one nucleotide substitution as described in Table 5d below. The nucleotide and amino acid 
variations as shown in Table 5d are shown in terms of the nucleotide sequence of SEQ ID No. 2, 
and specify variations as found in exons M862, N, O, Qbis and R. The invention encompasses 
purified, isolated, or recombinant polynucleotides and polypeptides encoded thereby, wherein 
the polynucleotides comprise a contiguous span of at least 8, 12, 1 5, 1 8, 20, 25, 30, 35, 40, 45, 
50, 60, 70, 80, 100, 1 50, or 200 nucleotides of SEQ ID No 2 or the complement thereof, and 
15 wherein said contiguous span further comprises a nucleotide sequence variation according to 

Table 5d. 

The present invention comprises a purified or isolated sbgl cDNA encoding an sbgl 
protein or a peptide fragment or variant thereof. In one embodiment, a purified or isolated 
nucleic acid encoding an sbgl protein may have the amino acid sequence of any of SEQ ID Nos 

20 27 to 35 or a peptide fragment or variant thereof. 

Preferred nucleic acids of the invention also include isolated, purified, or recombinant 
polynucleotides comprising a contiguous span of at least 8, 12, 15, 18, 20, 25, 30, 35, 40, 50, 60, 
70, 75, 80, 100, 200 or 500 nucleotides of a nucleotide sequence selected from the group 
consisting of SEQ ID Nos 2 to 26, or the complements thereof, wherein said span comprises a 

25 sbgl-related biallelic marker of the invention. The positions of selected biallelic markers of the 

invention in sbgl cDNA sequences and polypeptide sequences are listed below in Table 5e. 
Said contiguous span may comprise a biallelic marker selected from the group of biallelic 
markers listed in Table 5e; optionally, said biallelic marker is selected from the group consisting 
of the biallelic markers located in an sbgl cDNA form, as listed in Table 5e; optionally, said 

30 biallelic marker is selected from the group consisting of the biallelic markers located in an sbgl 

coding sequence, as listed in Table 5e. 

Expression of sbgl mRNA was further confirmed by Northern blotting. Using a probe 
corresponding to exon O of the sbgl gene, a band corresponding to an sbgl mRNA was 
detected. 

35 While this section is entitled "sbgl cDNA Sequences," it should be noted that nucleic 



:nnrtn. ^\Mr^ nncoc < n a ~ 



WO 00/58510 „^„„ 

PCT/IBOO/00435 

44 

acid fragments of any size and sequence may also be comprised by the polynucleotides 
described in this section, flanking the genomic sequences of sbgl on either side or between two 
or more such genomic sequences. 
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Table 5e 



Amplicon 


Biallelic 
Marker Name 


Allele 1 


Allele 2 


Genomic 
position 
on SEQ 
ID No 1 


cDNA form: position of marker 
on cDNA (position in 
polypeptide) 


8-132 


8-132-179 


A 


T 


215838 


M862NOQbisR:976 

M1090NOXQbisR:844 

MSI 

MS2NOQbisR:990 

M1069NOQR:878 

M 1 069N2OQ 1 QbisR:758 

M1069NOQlQbisR:804 

M 1 069N2OO2QbisR:8 1 7 

M 1 069NOO2QbisR:863 

M 1 069N2NbisOO2XQbisR:936 

M1069N2OQR:832 

M1069N2OQbisR:723 

M1069NNbtsOQR:901 

M 1 069NNbisOQbisR:792 

M 1 069NOO2XQbisR:959 

M1069NOXQR:974 

M 1 069NOQbisRbis:803 
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Sbgl Coding Regions 

The sbgl open reading frame is contained in the corresponding mRNA of a cDNA 
sequence selected from the group consisting of SEQ ID Nos 2 to 26. The effective sbgl coding 
sequence (CDS) may include several forms as indicated above, in some embodiments 
encompassing isolated, purified, and recombinant polynucleotides which encode a polypeptide 
comprising a contiguous span of at least 4 amino acids, preferably 6, more preferably at least 8 
or 10 amino acids, yet more preferably at least 12, 15, 20, 25, 30, 40, 50, or 100 amino acids of 
SEQ ID Nos 27 to 35. The effective sbgl coding sequence (CDS) may comprise the region 
between the first nucleotide of the ATG codon and the end nucleotide of the stop codon of SEQ 
ID Nos 2 to 26 as indicated in Table 5a above. 

The above disclosed polynucleotide that contains the coding sequence of the sbgl gene 
may be expressed in a desired host cell or a desired host organism when this polynucleotide is 
placed under the control of suitable expression signals. The expression signals may be either 
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the expression signals contained in the regulatory regions in the sbgl gene of the invention or in 
contrast the signals may be exogenous regulatory nucleic sequences. Such a polynucleotide, 
when placed under the suitable expression signals, may also be inserted in a vector for its 
expression and/or amplification. 

Regulatory Sequences Of sbgl 

As mentioned, the genomic sequence of the sbgl gene contains regulatory sequences 
both in the non-coding 5 '-flanking region and in the non-coding 3'-flanking region that border 
the sbgl coding region containing the exons of the gene. 

In one aspect, the 3 '-regulatory sequence of the sbgl gene may comprise the sequence 
localized between the nucleotide in position 213818 and the nucleotide in position 21581 8 of 
the nucleotide sequence of SEQ ID No 1 . In one aspect, the 5 '-regulatory sequence of the sbgl 
gene may comprise the sequence localized between the 5 f end of the particular form of exon M 
and nucleotide position 243685 of SEQ ID No 1 . 

Polynucleotides derived from the 5' and 3' regulatory regions are useful in order to 
detect the presence of at least a copy of an sbgl nucleotide sequence of SEQ ID No 1 or a 
fragment thereof in a test sample. 

The promoter activity of the 5' regulatory regions contained in sbgl can be assessed as 
described below. 

In order to identify the relevant biologically active polynucleotide fragments or variants 
of an sbgl regulatory region, one of skill in the art will refer to Sambrook et al.(1989), which 
describes the use of a recombinant vector carrying a marker gene (i.e. beta galactosidase, 
chloramphenicol acetyl transferase, etc.) the expression of which will be detected when placed 
under the control of a biologically active polynucleotide fragment or variant of the sbgl 
sequence of SEQ ID No 1. Genomic sequences located upstream of the first exon of the sbgl 
gene are cloned into a suitable promoter reporter vector, such as the pSEAP-Basic, pSEAP- 
Enhancer, ppgal-Basic, ppgal-Enhancer, or pEGFP-1 Promoter Reporter vectors available from 
Clontech, or pGL2-basic or pGL3-basic promoterless luciferase reporter gene vector from 
Promega. Briefly, each of these promoter reporter vectors include multiple cloning sites 
positioned upstream of a reporter gene encoding a readily assayable protein such as secreted 
alkaline phosphatase, luciferase, (3 galactosidase, or green fluorescent protein. The sequences 
upstream of the sbgl coding region are inserted into the cloning sites upstream of the reporter 
gene in both orientations and introduced into an appropriate host cell. The level of reporter 
protein is assayed and compared to the level obtained from a vector which lacks an insert in the 
cloning site. The presence of an elevated expression level in the vector containing the insert 
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with respect to the control vector indicates the presence of a promoter in the insert. If 
necessary, the upstream sequences can be cloned into vectors which contain an enhancer for 
increasing transcription levels from weak promoter sequences. A significant level of expression 
above that observed with the vector lacking an insert indicates that a promoter sequence is 
present in the inserted upstream sequence. 

Promoter sequence within the upstream genomic DNA may be further defined by 
constructing nested 5' and/or 3' deletions in the upstream DNA using conventional techniques 
such as Exonuclease III or appropriate restriction endonuclease digestion. The resulting 
deletion fragments can be inserted into the promoter reporter vector to determine whether the 
deletion has reduced or obliterated promoter activity, such as described, for example, by Coles 
et al.(1998). In this way, the boundaries of the promoters may be defined. If desired, potential 
individual regulatory sites within the promoter may be identified using site directed mutagenesis 
or linker scanning to obliterate potential transcription factor binding sites within the promoter 
individually or in combination. The effects of these mutations on transcription levels may be 
determined by inserting the mutations into cloning sites in promoter reporter vectors. This type 
of assay is well-known to those skilled in the art and is described in WO 97/17359, US Patent 
No. 5,374,544; EP 582 796; US Patent No. 5,698,389; US 5,643,746; US Patent No. 5,502,176; 
and US Patent 5,266,488. 

The strength and the specificity of the promoter of the sbgl gene can be assessed 
through the expression levels of a detectable polynucleotide operably linked to the sbgl 
promoter in different types of cells and tissues. The detectable polynucleotide may be either a 
polynucleotide that specifically hybridizes with a predefined oligonucleotide probe, or a 
polynucleotide encoding a detectable protein, including an sbgl polypeptide or a fragment or a 
variant thereof. This type of assay is well-known to those skilled in the art and is described in 
US Patent No. 5,502,176; and US Patent No. 5,266,488. Some of the methods are discussed in 
more detail below. 

Polynucleotides carrying the regulatory elements located at the 5' end and at the 3' end 
of the sbgl coding region may be advantageously used to control the transcriptional and 
translational activity of an heterologous polynucleotide of interest. 

Thus, the present invention also concerns a purified or isolated nucleic acid comprising 
a polynucleotide which is selected from the group consisting of the 5' and 3' regulatory regions 
of sbgl, or a sequence complementary thereto or a biologically active fragment or variant 
thereof. In one aspect, "3' regulatory region" may comprise the nucleotide sequence located 
between positions 2 1 38 1 8 and 2 1 58 1 8 of SEQ ID No 1 . In one aspect, «5' regulatory region" 
may comprise the nucleotide sequence located between the 5' end of a particular variant of exon 
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M and nucleotide position 243685 of SEQ ID No 1 . The 5' end of particular form of exon M 
may be selected from the group consisting of nucleotide postions 240569, 241 596, 24061 7, 
240644, 240824, 240994, 241685 and 240993 of SEQ ID No 1. In a preferred aspect, the 5' 
regulatory region comprises the nucleotides of nucleotide positions 241 686 to 243685 of SEQ 
ID No 1 . 

The invention also pertains to a purified or isolated nucleic acid comprising a 
polynucleotide having at least 95% nucleotide identity with a polynucleotide selected from the 
group consisting of the 5' and 3' regulatory regions, advantageously 99 % nucleotide identity, 
preferably 99.5% nucleotide identity and most preferably 99.8% nucleotide identity with a 
polynucleotide selected from the group consisting of the 5' and 3' regulatory regions, or a 
sequence complementary thereto or a variant thereof or a biologically active fragment thereof. 

Another object of the invention consists of purified, isolated or recombinant nucleic 
acids comprising a polynucleotide that hybridizes, under the stringent hybridization conditions 
defined herein, with a polynucleotide selected from the group consisting of the nucleotide 
sequences of the 5'- and 3' regulatory regions of sbgl, or a sequence complementary thereto or 
a variant thereof or a biologically active fragment thereof. 

Preferred fragments of the 5' regulatory region have a length of about 1500 or 1000 
nucleotides, preferably of about 500 nucleotides, more preferably about 400 nucleotides, even 
more preferably 300 nucleotides and most preferably about 200 nucleotides. 

Preferred fragments of the 3' regulatory region are at least 50, 100, 1 50, 200, 300 or 
400 bases in length. 

"Biologically active" sbgl polynucleotide derivatives of SEQ ID No 1 are 
polynucleotides comprising or alternatively consisting in a fragment of said polynucleotide 
which is functional as a regulatory region for expressing a recombinant polypeptide or a 
25 recombinant polynucleotide in a recombinant cell host. It could act either as an enhancer or as a 

repressor. 

For the purpose of the invention, a nucleic acid or polynucleotide is "functional" as a 
regulatory region for expressing a recombinant polypeptide or a recombinant polynucleotide if 
said regulatory polynucleotide contains nucleotide sequences which contain transcriptional and 
30 translational regulatory information, and such sequences are "operably linked" to nucleotide 

sequences which encode the desired polypeptide or the desired polynucleotide. 

The regulatory polynucleotides of the invention may be prepared from the nucleotide 
sequence of SEQ ID No 1 by cleavage using suitable restriction enzymes, as described for 
example in Sambrook et al.(1989). The regulatory polynucleotides may also be prepared by 
digestion of SEQ ID No 1 by an exonuclease enzyme, such as Bal3 1 (Wabiko et al., 1986). 
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These regulatory polynucleotides can also be prepared by nucleic acid chemical synthesis, as 
described elsewhere in the specification. 

The sbgl regulatory polynucleotides according to the invention may be part of a 
recombinant expression vector that may be used to express a coding sequence in a desired host 
cell or host organism. The recombinant expression vectors according to the invention are 
described elsewhere in the specification. 

A preferred 5'-regulatory polynucleotide of the invention includes the 5'-untranslated 
region (5'-UTR) of the sbgl cDNA, or a biologically active fragment or variant thereof. 

A preferred 3'-regulatory polynucleotide of the invention includes the 3 '-untranslated 
region (3'-UTR) of the sbgl cDNA, or a biologically active fragment or variant thereof. 

A further object of the invention consists of a purified or isolated nucleic acid 
comprising: 

a) a nucleic acid comprising a regulatory nucleotide sequence selected from the group 
consisting of: 

(i) a nucleotide sequence comprising a polynucleotide of the sbgl 5' regulatory region or a 
complementary sequence thereto; 

(ii) a nucleotide sequence comprising a polynucleotide having at least 95% of nucleotide 
identity with the nucleotide sequence of the sbgl 5' regulatory region or a complementary 
sequence thereto; 

(iii) a nucleotide sequence comprising a polynucleotide that hybridizes under stringent 
hybridization conditions with the nucleotide sequence of the sbgl 5' regulatory region or a 
complementary sequence thereto; and 

(iv) a biologically active fragment or variant of the polynucleotides in (i), (ii) and (iii); 

b) a polynucleotide encoding a desired polypeptide or a nucleic acid of interest, 
operably linked to the nucleic acid defined in (a) above; and 

c) optionally, a nucleic acid comprising a 3'- regulatory polynucleotide, preferably a 3'- 
regulatory polynucleotide of the sbgl gene. 

In a specific embodiment of the nucleic acid defined above, said nucleic acid includes 
the 5'-untranslated region (5'-UTR) of the sbgl cDNA, or a biologically active fragment or 
variant thereof. 

In a second specific embodiment of the nucleic acid defined above, said nucleic acid 
includes the 3 '-untranslated region (3'-UTR) of the sbgl cDNA, or a biologically active 
fragment or variant thereof. 

The regulatory polynucleotide of the 5' regulatory region, or its biologically active 
fragments or variants, is operably linked at the 5 '-end of the polynucleotide encoding the 
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desired polypeptide or polynucleotide. 

The regulatory polynucleotide of the 3' regulatory region, or its biologically active 
fragments or variants, is advantageously operably linked at the 3 '-end of the polynucleotide 
encoding the desired polypeptide or polynucleotide. 

The desired polypeptide encoded by the above-described nucleic acid may be of various 
nature or origin, encompassing proteins of prokaryotic or eukaryotic origin. Among the 
polypeptides expressed under the control of an sbgl regulatory region include bacterial, fungal 
or viral antigens. Also encompassed are eukaryotic proteins such as intracellular proteins, like 
"house keeping" proteins, membrane-bound proteins, like receptors, and secreted proteins like 
endogenous mediators such as cytokines. The desired polypeptide may be the sbgl protein, 
especially the protein of the amino acid sequences of SEQ ID Nos 27 to 35, or a fragment or a 
variant thereof. 

The desired nucleic acids encoded by the above-described polynucleotide, usually an 
RNA molecule, may be complementary to a desired coding polynucleotide, for example to the 
sbgl coding sequence, and thus useful as an antisense polynucleotide. 

Such a polynucleotide may be included in a recombinant expression vector in order to 
express the desired polypeptide or the desired nucleic acid in host cell or in a host organism. 
Suitable recombinant vectors that contain a polynucleotide such as described herein are 
disclosed elsewhere in the specification. 

Genomic Sequences of sbg2 polynucleotides 

Particularly preferred sbg2 nucleic acids of the invention include isolated, purified, or 
recombinant polynucleotides comprising a contiguous span of at least 12, 15, 18, 20, 25, 30, 35, 
40, 50, 60, 70, 80, 90, 100, 150, 200 nucleotides, to the extent that the length of said span is 
consistent with said nucleotide position range, of nucleotide positions 201 188 to 216915, 
201 1 88 to 201234, 214676 to 214793, 21 5702 to 215746 and 216836 to 216915 of SEQ ID No 
1 , or the complements thereof. 

It should be noted that nucleic acid fragments of any size and sequence may be 
comprised by the polynucleotides described in this section. 

The human sbg2 gene comprises exons selected from at least 4 exons, referred to herein 
as exons S, T, U and V. The nucleotide positions of sbg2 exons in SEQ ID No. 1 are detailed 
below in Table 5f. 



Table 5f 



Exon 


Position in SEQ ID No 1 


Intron 


Position in SEQ ID No 1 




Beginning 


End 




Beginning 


End 
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s 


201188 


201234 


S 


201235 


214675 


T 


214676 


214793 


T 


214794 


215701 


U 


215702 


215746 


U 


215747 


216835 


V 


216836 


216915 









Thus, the invention embodies purified, isolated, or recombinant polynucleotides 
comprising a nucleotide sequence selected from the group consisting of the exons of the sbg2 
gene, or a sequence complementary thereto. Preferred are purified, isolated, or recombinant 
polynucleotides comprising at least one exon having the nucleotide position ranges listed in 
Table 5f selected from the group consisting of the exons S, T, U and V of the sb g 2 gene, or a 
complementary sequence thereto or a fragment or a variant thereof. Also encompassed by the 
invention are purified, isolated, or recombinant nucleic acids comprising a combination of at 
least two exons of the sb g 2 gene selected from the group consisting of exons S, T, U and V, 
wherein the polynucleotides are arranged within the nucleic acid in the same relative order Is in 
SEQIDNo. 1. 

The present invention further embodies purified, isolated, or recombinant 
polynucleotides comprising a nucleotide sequence selected from the group consisting of the 
introns of the sb g 2 gene, or a sequence complementary thereto. The position of the introns is 
detailed in Table 5f. Intron S refers to the nucleotide sequence located between Exon S and 
Exon T, and so on. Thus, the invention embodies purified, isolated, or recombinant 
polynucleotides comprising a nucleotide sequence selected from the group consisting of the 3 
introns of the sbg2 gene, or a sequence complementary thereto. 

The invention also encompasses a purified, isolated, or recombinant polynucleotide 
comprising a nucleotide sequence of sbg2 having at least 70, 75, 80, 85, 90, 95, 98 or 99% 
nucleotide identity with a sequence selected from the group consisting of nucleotide positions 
201 1 88 to 216915, 201 188 to 201234, 214676 to 214793, 215702 to 215746 and 216836 to 
2 1 691 5 of SEQ ID No. 1 or a complementary sequence thereto or a fragment thereof. The 
nucleotide differences as regards the nucleotide positions 201 188 to 216915, 201 188 to 201234 
214676 to 214793, 215702 to 215746 and 216836 to 216915 of SEQ ID No. 1 may be generally 
randomly distributed throughout the entire nucleic acid. 

Another object of the invention relates to purified, isolated or recombinant nucleic acids 
comprising a polynucleotide that hybridizes, under the stringent hybridization conditions 
defined herein, with a polynucleotide selected from the group consisting of nucleotide positions 
201 188 to 216915,201188 to 201234, 214676 to214793, 215702 to 215746 and 216836 to 
2 1 69 1 5 of SEQ ID No 1 , or a sequence complementary thereto or a variant thereof or a 
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biologically active fragment thereof. 

Additional preferred nucleic acids of the invention include isolated, purified, or 
recombinant sbg2 polynucleotides comprising a contiguous span of at least 12, 15, 18, 20, 25, 
30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 90, 100 or 200 nucleotides of nucleotide positions 
20 1 1 88 to 2 1 69 1 5, 20 1 1 88 to 20 1 234, 2 1 4676 to 2 1 4793, 2 1 5702 to 2 1 5746 and 216836 to 
216915 of SEQ ID No 1 or the complements thereof, wherein said contiguous span comprises 
an sbg2-related biallelic marker. Optionally, said biallelic marker is selected from the group 
consisting of A79 to A99. It should be noted that nucleic acid fragments of any size and 
sequence may also be comprised by the polynucleotides described in this section. Either the 
original or the alternative allele may be present at said biallelic marker. 

An sbg2 polynucleotide or gene may further contain regulatory sequences both in the 
non-coding 5 '-flanking region and in the non-coding 3'-flanking region that border the region 
containing said genes or exons. Polynucleotides derived from 5' and 3' regulatory regions are 
useful in order to detect the presence of at least a copy of a nucleotide sequence comprising an 
sbg2 nucleotide sequence of SEQ ID No. 1 or a fragment thereof in a test sample. 
Polynucleotides carrying the regulatory elements located at the 5' end and at the 3' end of the 
genes comprising the exons of the present invention may be advantageously used to control the 
transcriptional and translational activity of a heterologous polynucleotide of interest. 

While this section is entitled u sbg2 cDNA Sequences," it should be noted that nucleic 
acid fragments of any size and sequence may also be comprised by the polynucleotides 
described in this section, flanking the genomic sequences of sbg2 on either side or between two 
or more such genomic sequences. 

Polynucleotide Constructs 

The terms " polynucleotide construct " and " recombinant polynucleotide " are used 
interchangeably herein to refer to linear or circular, purified or isolated polynucleotides that 
have been artificially designed and which comprise at least two nucleotide sequences that are 
not found as contiguous nucleotide sequences in their initial natural environment. It should be 
noted that the present invention embodies recombinant vectors comprising any one of the 
polynucleotides described in the present invention. 

DNA Constructs that Enables Directing Temporal and Spatial Expression of sbgl, 
g34665, sbg2, g35017 and g35018 Nucleic Acid Sequences in Recombinant Cell Hosts and 
in Transgenic Animals 

In order to study the physiological and phenotypic consequences of a lack of synthesis 
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of a protein encoded by a nucleotide sequence comprising an sbgl, g34665, sbg2, g35017 or 
835018 Polynucleotide, both at the cell level and at the multi cellular organism level, the 
invention also encompasses DNA constructs and recombinant vectors enabling a conditional 
expression of a specific allele of a nucleotide sequence comprising an sbgl, g34665, sbg2, 
g35017 or g35018 polynucleotide and also of a copy of a sequence comprising a nucleotide 
sequence of an sbgl , g34665, sbg2, g35017 or g35018 polynucleotide, or a fragment thereof, 
harboring substitutions, deletions, or additions of one or more bases. These base substitutions, 
deletions or additions may be located either in an exon, an intron or a regulatory sequence, in 
particular a 5' regulatory sequence of an sbgl, g34665, sbg2, g35017 or g 35018 polynucleotide. 
In a preferred embodiment, the nucleotide sequence comprising an sbgl, g34665, sbg2, g35017 
or g350 1 8 polynucleotide further comprises a biallelic marker of the present invention. 

A first preferred DNA construct is based on the tetracycline resistance operon tet from 
E. coli transposon Tnl 10 for controlling the expression of an sbgl, g34665, sbg2, g35017 or 
g35018 polynucleotide, such as described by Gossen et al. (1992, 1995) and Furth et al.(1994). 
Such a DNA construct contains seven tet operator sequences from TnlO (tetop) that are fused to 
either a minimal promoter or a S'-regulatory sequence of the sbgl , g34665, sbg2, g3501 7 or 
g35018 polynucleotide, said minimal promoter or said sbgl, g34665, sbg2, g35017 or g35018 
polynucleotide regulatory sequence being operably linked to a polynucleotide of interest that 
codes either for a sense or an antisense oligonucleotide or for a polypeptide, including an sbgl, 
g34665, sbg2, g35017 or g3501 8 polynucleotide-encoded polypeptide or a peptide fragment 
thereof. This DNA construct is functional as a conditional expression system for the nucleotide 
sequence of interest when the same cell also comprises a nucleotide sequence coding for either 
the wild type (tTA) or the mutant (rTA) repressor fused to the activating domain of viral 
protein VP16 of herpes simplex virus, placed under the control of a promoter, such as the 
HCMVIE1 enhancer/promoter or the MMTV-LTR. Indeed, a preferred DNA construct of the . 
invention comprises both the polynucleotide containing the tet operator sequences and the 
polynucleotide containing a sequence coding for the tTA or the rTA repressor. 

In a specific embodiment, the conditional expression DNA construct contains the 
sequence encoding the mutant tetracycline repressor rTA, the expression of the polynucleotide 
of interest is silent in the absence of tetracycline and induced in its presence. 

DNA Constructs Allowing Homologous Recombination: Replacement Vectors 
A second preferred DNA construct will comprise, from 5'-end to 3'-end: (a) a first 
nucleotide sequence comprising an sbgl polynucleotide; (b) a nucleotide sequence comprising 
a positive selection marker, such as the marker for neomycine resistance {mo); and (c) a 
second nucleotide sequence comprising a respective sbgl polynucleotide, and is located on the 
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genome downstream of the first sbgl polynucleotide sequence (a). Also encompassed are DNA 
construct prepared in an analogous manner using g34665, sbg2, g3 50 17 or g35018 nucleotide 
sequences in place of the sbgl sequences described above. 

In a preferred embodiment, this DNA construct also comprises a negative selection 
marker located upstream the nucleotide sequence (a) or downstream the nucleotide sequence 
(c). Preferably, the negative selection marker comprises the thymidine kinase (tk) gene 
(Thomas et al., 1986), the hygromycine beta gene (Te Riele et al. s 1990), the hprt gene ( Van 
der Lugt et al., 1991; Reid et al., 1990) or the Diphteria toxin A fragment (Dt-A) gene (Nada et 
al., 1993; Yagi et al.1990). Preferably, the positive selection marker is located within and exon 
of an sbgl, g34665, sbg2, g35017 or g35018 polynucleotide so as to interrupt the sequence 
encoding the sbgl, g34665, sbg2, g35017 or g3501 8 protein. These replacement vectors are 
described, for example, by Thomas et al.(1986; 1987), Mansour et al.(1988) and Koller et 
al.(1992). 

The first and second nucleotide sequences (a) and (c) may be indifferently located 
within an sbgl, g34665, sbg2, g35017 or g3501 8 polynucleotide regulatory sequence, an 
intronic sequence, an exon sequence or a sequence containing both regulatory and/or intronic 
and/or exon sequences. The size of the nucleotide sequence of (a) and (c) ranges from 1 to 50 
kb, preferably from 1 to 10 kb, more preferably from 2 to 6 kb and most preferably from 2 to 4 
kb. 

DNA Constructs Allowing Homologous Recombination: Cre-LoxP System. 

These new DNA constructs make use of the site specific recombination system of the 
PI phage. The PI phage possesses a recombinase called Cre which interacts specifically with a 
34 base pairs loxP site. The loxP site is composed of two palindromic sequences of 13 bp 
separated by a 8 bp conserved sequence (Hoess et al., 1986). The recombination by the Cre 
enzyme between two loxP sites having an identical orientation leads to the deletion of the DNA 
fragment. 

The Cre-/oxP system used in combination with a homologous recombination technique 
has been first described by Gu et al.(1993, 1994). Briefly, a nucleotide sequence of interest to 
be inserted in a targeted location of the genome harbors at least two loxP sites in the same 
orientation and located at the respective ends of a nucleotide sequence to be excised from the 
recombinant genome. The excision event requires the presence of the recombinase (Cre) 
enzyme within the nucleus of the recombinant cell host. The recombinase enzyme may be 
brought at the desired time either by (a) incubating the recombinant cell hosts in a culture 
medium containing this enzyme, by injecting the Cre enzyme directly into the desired cell, such 
as described by Araki et al.(1995), or by lipofection of the enzyme into the cells, such as 
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described by Baubonis et al.( 1993); (b) transacting the cell host with a vector comprising the 
Cre coding sequence operably linked to a promoter functional in the recombinant cell host, 
which promoter being optionally inducible, said vector being introduced in the recombinant cell 
host, such as described by Gu et al.( 1 993) and Sauer et al.( 1 988); (c) introducing in the genome 
of the cell host a polynucleotide comprising the Cre coding sequence operably linked to a 
promoter functional in the recombinant cell host, which promoter is optionally inducible, and 
said polynucleotide being inserted in the genome of the cell host either by a random insertion 
event or an homologous recombination event, such as described by Gu et al.(1994). 

In a specific embodiment, the vector containing the sequence to be inserted in an sbgl, 
g34665, sbg2, g35017 or g35018 gene sequence by homologous recombination is constructed in 
such a way that selectable markers are flanked by loxP sites of the same orientation, it is 
possible, by treatment by the Cre enzyme, to eliminate the selectable markers while leaving the 
sbgl, g 34665, sbg2, g3501 7 or g3501 8 polynucleotide sequences of interest that have been 
inserted by an homologous recombination event. Again, two selectable markers are needed: a 
positive selection marker to select for the recombination event and a negative selection marker 
to select for the homologous recombination event. Vectors and methods using the Cre-/oxP 
system are described by Zou et al.(1994). 

Thus, in one aspect, a further preferred DNA construct of the invention comprises, from 
5'-end to 3*-end: (a) a first nucleotide sequence that is comprised by an sbgl polynucleotide; (b) 
a nucleotide sequence comprising a polynucleotide encoding a positive selection marker, said 
nucleotide sequence comprising additionally two sequences defining a site recognized by a - 
recombinase, such as a loxP site, the two sites being placed in the same orientation; and (c) a 
second nucleotide sequence comprising an sbgl polynucleotide, and is located on the genome 
downstream of the first sbg l polynucleotide sequence (a). Also encompassed are DNA 
construct prepared in an analogous manner using g34665, sbg2, g35017 or g35018 nucleotide 
sequences in place of the sbgl sequences described above. 

The sequences defining a site recognized by a recombinase, such as a loxP site, are 
preferably located within the nucleotide sequence (b) at suitable locations bordering the 
nucleotide sequence for which the conditional excision is sought. In one specific embodiment, 
two loxP sites are located at each side of the positive selection marker sequence, in order to 
allow its excision at a desired time after the occurrence of the homologous recombination event. 

In a preferred embodiment of a method using the third DNA construct described above, 
the excision of the polynucleotide fragment bordered by the two sites recognized by a 
recombinase, preferably two loxP sites, is performed at a desired time, due to the presence 
within the genome of the recombinant host cell of a sequence encoding the Cre enzyme 
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operably linked to a promoter sequence, preferably an inducible promoter, more preferably a 
tissue-specific promoter sequence and most preferably a promoter sequence which is both 
inducible and tissue-specific, such as described by Gu et al.(1994). 

The presence of the Cre enzyme within the genome of the recombinant cell host may 
result from the breeding of two transgenic animals, the first transgenic animal bearing the sbgl , 
g34665, sbg2, g35017 or g35018 polynucleotide -derived sequence of interest containing the 
fox? sites as described above and the second transgenic animal bearing the Cre coding sequence 
operably linked to a suitable promoter sequence, such as described by Gu et al.(1994). 

Spatio-temporal control of the Cre enzyme expression may also be achieved with an 
adenovirus based vector that contains the Cre gene thus allowing infection of cells, or in vivo 
infection of organs, for delivery of the Cre enzyme, such as described by Anton and Graham 
(1995) and Kanegae et al.(1995). 

The DNA constructs described above may be used to introduce a desired nucleotide 
sequence of the invention, preferably an sbgl, g34665, sbg2, g35017 or g3501 8 polynucleotide, 
and most preferably an altered copy an sbgl, g34665, sbg2, g35017 or g35018 polynucleotide 
sequence, within a predetermined location of the targeted genome, leading either to the 
generation of an altered copy of a targeted gene (knock-out homologous recombination) or to 
the replacement of a copy of the targeted gene by another copy sufficiently homologous to 
allow an homologous recombination event to occur (knock-in homologous recombination). In a 
specific embodiment, the DNA constructs described above may be used to introduce an sbgl, 
g34665, sbg2, g35017 or g3501 8 polynucleotide. 

Nuclear Antisense DNA Constructs 

Other compositions containing a vector of the invention comprise an oligonucleotide 
fragment of the sbgl, g34665, sbg2, g35017 or g3501 8 polynucleotide sequences of SEQ ID 
No.l respectively, as an antisense tool that inhibits the expression of the corresponding gene. 
Preferred methods using antisense polynucleotide according to the present invention are the 
procedures described by Sczakiel et al.(1995) or those described in PCT Application No WO 
95/24223. 

Preferably, the antisense tools are chosen among the polynucleotides (15-200 bp long) 
that are complementary to the 5'end of an sbgl, g34665, sbg2, g3501 7 or g3501 8 
polynucleotide mRNA. In one embodiment, a combination of different antisense 
polynucleotides complementary to different parts of the desired targeted gene are used. 

Preferably, the antisense polynucleotides of the invention have a 3* polyadenylation 
signal that has been replaced with a self-cleaving ribozyme sequence, such that RNA 
polymerase II transcripts are produced without poly(A) at their 3' ends, these antisense 
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polynucleotides being incapable of export from the nucleus, such as described by Liu et 
al.(1994). In a preferred embodiment, these sbgl, g 34665, sbg2, g350 17 or §35018 antisense 
polynucleotides also comprise, within the ribozyme cassette, a histone stem-loop structure to 
stabilize cleaved transcripts against 3'-5' exonucleolytic degradation, such as the structure 
described by Eckner et al.( 1 99 1 ). 

Oligonucleotide Probes And Primers 

The polynucleotides of the invention are useful in order to detect the presence of at least 
a copy of a nucleotide sequence of SEQ ID No. 1 or of the respective sbgl, g34665, sbg2, 
g35017 and g35018 polynucleotide or gene, or a fragment, complement, or variant ihereof in a 
test sample. 

Particularly preferred probes and primers of the invention include isolated, purified, or 
recombinant polynucleotides comprising a contiguous span of at least 12, 15, 1 8, 20, 25 30 35 
40, 50, 60, 70, 80, 90, 100, 150, 200, 500, 1000 or 2000 nucleotides, to the extent that said span' 
is consistent with the length of the nucleotide position range, of SEQ ID No 1, wherein said 
contiguous span comprises at least 1, 2, 3, 4, 5, 7 or 10 of the following nucleotide positions of 
SEQ ID No 1: 

(a) nucleotide positions 31 to 292651 and 292844 to 319608; 

(b) 290653 to 292652, 292653 to 296047, 292653 to 292841, 295555 to 296047, 
295580 to 296047 and 296048 to 298048; 

(c) 94 124 to 94964; 

(d) 31 to 1107, 1108 to 65853, 1108 to 1289, 14877 to 14920, 18778 to 18862 25593 to 
25740, 29388 to 29502, 29967 to 30282, 64666 to 64812, 65505 to 65853 and 65854 to 67854; 

(e) 213818to215818,215819to215941,215819to215975,216661 to216952, 
216661 to 217061, 217027 to 217061, 229647 to 229742, 230408 to 230721, 23 1272 to' 

231412, 231787 to 231880, 231870 to 231879,234174 to 234321, 237406 to 237428, 239719 
to 239807, 239719 to 239853, 240528 to 240569, 240528 to 240596, 240528 to 240617, 
240528 to 240644, 240528 to 240824, 240528 to 240994, 240528 to 24 1 685, 240800 to 240993 
and 241686 to 243685; 

(f) 201188 to 216915, 201 188 to 201234, 214676 to 214793, 215702 to 215746 and 
216836 to216915; or 

(g) a complementary sequence thereto or a fragment thereof. 

Probes and primers of the invention also include isolated, purified, or recombinant 
polynucleotides having at least 70, 75, 80, 85, 90, or 95% nucleotide identity with a contiguous 
span of at least 12, 15, 18, 20, 25, 30, 35, 40, 50, 60, 70, 80, 90, 100, 150, 200, 500, 1000 or 
2000 nucleotides of nucleotide positions 31 to 292651 and 292844 to 319608 of SEQ ID No. 1. 
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Preferred probes and primers of the invention also include isolated, purified, or recombinant 
polynucleotides comprising an sbgl , g34665, sbg2, g3501 7 or g3501 8 nucleotide sequence 
having at least 70, 75, 80, 85, 90, or 95% nucleotide identity with at least one sequence selected 
from the group consisting of the following nucleotide positions of SEQ ID No. 1 : 

(a) 290653 to 292652, 292653 to 296047, 292653 to 292841, 295555 to 296047, 
295580 to 296047 and 296048 to 298048; 

(b) 94124 to 94964: 

(c) 31 to 1107, 1108 to 65853, ll08to 1289, 14877 to 14920, 18778 to 18862,25593 to 
25740, 29388 to 29502, 29967 to 30282, 64666 to 64812, 65505 to 65853 and 65854 to 67854; 

(d) 213818 to 215818, 215819 to 215941, 215819 to 215975, 216661 to 216952, 
216661 to 217061, 217027 to 217061, 229647 to 229742, 230408 to 230721, 231272 to 
231412, 231787 to 231880, 231870 to 231879, 234174 to 234321, 237406 to 237428, 239719 
to 239807, 239719 to 239853, 240528 to 240569, 240528 to 240596, 240528 to 240617, 
240528 to 240644, 240528 to 240824, 240528 to 240994, 240528 to 241685, 240800 to 240993 
and 241 686 to 243685; 

(e) 201 188 to 216915, 201 188 to 201234, 214676 to 214793, 215702 to 215746 and 
216836 to 216915; or 

(f) a complementary sequence thereto or a fragment thereof. 

Another set of probes and primers of the invention include isolated, purified, or 
recombinant polynucleotides comprising a contiguous span of at least 12, 15, 18, 20, 25, 30, 35, 
40, 50, 60, 70, 80, 90, 100, 150, 200, 500, 1000 or 2000 nucleotides of SEQ ID No. 1 or the 
complements thereof, wherein said contiguous span comprises at least 1, 2, 3, 5, or 10 
nucleotide positions of any one of the ranges of nucleotide positions, designated posl to posl 66, 
of SEQ ID No. 1 listed in Table 1 above. 

The invention also relates to nucleic acid probes characterized in that they hybridize 
specifically, under the stringent hybridization conditions defined above, with a contiguous span 
of at least 12, 15, 18, 20, 25, 30, 35, 40, 50, 60, 70, 80, 90, 100, 150, 200, 500, 1000 or 2000 
nucleotides of nucleotide positions 31 to 292651 and 292844 to 319608 of SEQ ID No. 1, or a 
variant thereof or a sequence complementary thereto. Particularly preferred are nucleic acid 
probes characterized in that they hybridize specifically, under the stringent hybridization 
conditions defined above, with a nucleic acid selected from the group consisting of nucleotide 
positions: 

(a) 290653 to 292652, 292653 to 296047, 292653 to 292841, 295555 to 296047, 
295580 to 296047 and 296048 to 298048; 

(b) 94124 to 94964; 
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(c) 31 to 1 107, 1 108 to 65853, 1 108 to 1289, 14877 to 14920, 18778 to 18862, 25593 to 
25740, 29388 to 29502, 29967 to 30282, 64666 to 64812, 65505 to 65853 and 65854 to 67854; 

(d) 213818 to 215818, 215819 to 215941, 215819 to 215975, 216661 to 216952, 
2 1 666 1 to 2 1 706 1 , 2 1 7027 to 2 1 7061, 229647 to 229742, 230408 to 23072 1 , 23 1 272 to ' 

231412, 231787 to 231880, 231870 to 231879, 234174 to 234321, 237406 to 237428, 239719 
to 239807, 239719 to 239853, 240528 to 240569, 240528 to 240596, 240528 to 24061 7, 
240528 to 240644, 240528 to 240824, 240528 to 240994, 240528 to 241685, 240800 to' 2 40993 
and 241686 to 243685; 

(e) 201 188 to216915, 201188 to 201234, 214676 to 214793, 215702 to 215746 and 
216836 to 216915; or 

(f) a complementary sequence thereto or a fragment thereof. 

The formation of stable hybrids depends on the melting temperature (Tm) of the DNA. 
The Tm depends on the length of the primer or probe, the ionic strength of the solution and the 
G+C content. The higher the G+C content of the primer or probe, the higher is the melting 
temperature because G:C pairs are held by three H bonds whereas A:T pairs have only two. 
The GC content in the probes of the invention usually ranges between 10 and 75 %, preferably 
between 35 and 60 %, and more preferably between 40 and 55 %. 

A probe or a primer according to the invention may be between 8 and 2000 nucleotides 
in length, or is specified to be at least 1 2, 1 5, 1 8, 20, 25, 35, 40, 50, 60, 70, 80, 1 00, 250, 500 , 
1000 nucleotides in length. More particularly, the length of these probes can range from 8, 10, 
1 5, 20, or 30 to 100 nucleotides, preferably from 1 0 to 50, more preferably from 1 5 to 30 ' 
nucleotides. Shorter probes tend to lack specificity for a target nucleic acid sequence and 
generally require cooler temperatures to form sufficiently stable hybrid complexes with the 
template. Longer probes are expensive to produce and can sometimes self-hybridize to form 
hairpin structures. The appropriate length for primers and probes under a particular set of assay 
conditions may be empirically determined by one of skill in the art. 

The primers and probes can be prepared by any suitable method, including, for 
example, cloning and restriction of appropriate sequences and direct chemical synthesis by a 
method such as the phosphodiester method of Narang et al.(1979), the phosphodiester method 
of Brown et al.(1979), the diethylphosphoramidite method of Beaucage et al.(1981) and the 
solid support method described in EP 0 707 592. 

Detection probes are generally nucleic acid sequences or uncharged nucleic acid 
analogs such as, for example peptide nucleic acids which are disclosed in International Patent 
Application WO 92/20702, morpholino analogs which are described in U.S. Patents Numbered 
5,1 85,444; 5,034,506 and 5,142,047. The probe may have to be rendered "non-extendable" in 
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that additional dNTPs cannot be added to the probe. In and of themselves analogs usually are 
non-extendable and nucleic acid probes can be rendered non-extendable by modifying the 3' end 
of the probe such that the hydroxyl group is no longer capable of participating in elongation. 
For example, the 3' end of the probe can be functional ized with the capture or detection label to 
5 thereby consume or otherwise block the hydroxyl group. Alternatively, the 3' hydroxyl group 

simply can be cleaved, replaced or modified; U.S. Patent Application Serial No. 07/049,061 
fsled Apr:! 19, 1993, describes modifications which can be used to render a probe non- 
extendable. 

Any of the polynucleotides of the present invention can be labeled, if desired, by 
10 incorporating a label detectable by spectroscopic, photochemical, biochemical, 

immunochemical, or chemical means. For example, useful labels include radioactive 
32 35 3 125 

substances ( P, S, H, I), fluorescent dyes (5-bromodesoxyuridin, fluorescein, 
acetylaminofluorene, digoxigenin) or biotin. Preferably, polynucleotides are labeled at their 3' 
and 5' ends. Examples of non-radioactive labeling of nucleic acid fragments are described in 
1 5 the French patent No. FR-781097S or by Urdea et al (1988) or Sanchez-Pescador et al (1988). 

In addition, the probes according to the present invention may have structural characteristics 
such that they allow the signal amplification, such structural characteristics being, for example, 
branched DNA probes as those described by Urdea et al. in 1991 or in the European patent No. 
EP 0 225 807 (Chiron). 

20 A label can also be used to capture the primer, so as to facilitate the immobilization of 

either the primer or a primer extension product, such as amplified DNA, on a solid support. A 
capture label is attached to the primers or probes and can be a specific binding member which 
forms a binding pair with the solid's phase reagent's specific binding member (e.g. biotin and 
streptavidin). Therefore depending upon the type of label carried by a polynucleotide or a 

25 probe, it may be employed to capture or to detect the target DNA. Further, it will be understood 

that the polynucleotides, primers or probes provided herein, may, themselves, serve as the 
capture label. For example, in the case where a solid phase reagent's binding member is a 
nucleic acid sequence, it may be selected such that it binds a complementary portion of a primer 
or probe to thereby immobilize the primer or probe to the solid phase. In cases where a 

30 polynucleotide probe itself serves as the binding member, those skilled in the art will recognize 

that the probe will contain a sequence or "tail" that is not complementary to the target. In the 
case where a polynucleotide primer itself serves as the capture label, at least a portion of the 
primer will be free to hybridize with a nucleic acid on a solid phase. DNA Labeling techniques 
are well known to the skilled technician. 

35 The probes of the present invention are useful for a number of purposes. They can be 
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notably used in Southern hybridization to genomic DNA. The probes can also be used to detect 
PCR amplification products. They may also be used to detect mismatches in a sequence 
comprising a polynucleotide of SEQ IDNos 1 to 26, 36 to 40 and 54 to 229, or an sbg 1 , 
g34665, sb g 2, g3501 7 or g3501 8 polynucleotide or gene or mRNA using other techniques. 

Any of the polynucleotides, primers and probes of the present invention can be 
conveniently immobilized on a solid support. Solid supports are known to those skilled in the 
art and include the walls of wells of a reaction tray, test tubes, polystyrene beads, magnetic 
beads, nitrocellulose strips, membranes, microparticles such as latex particles, sheep (or other 
animal) red blood cells, duracytes and others. The solid support is not critical and can be 
selected by one skilled in the art. Thus, latex particles, microparticles, magnetic or non- 
magnetic beads, membranes, plastic tubes, walls of microliter wells, glass or silicon chips, 
sheep (or other suitable animal's) red blood cells and duracytes are all suitable examples. ' 
Suitable methods for immobilizing nucleic acids on solid phases include ionic, hydrophobic 
covalent interactions and the like. A solid support, as used herein, refers to any material which 
is insoluble, or can be made insoluble by a subsequent reaction. The solid support can be 
chosen for its intrinsic ability to attract and immobilize the capture reagent. Alternatively, the 
solid phase can retain an additional receptor which has the ability to attract and immobilize the 
capture reagent. The additional receptor can include a charged substance that is oppositely 
charged with respect to the capture reagent itself or to a charged substance conjugated to the 
capture reagent. As yet another alternative, the receptor molecule can be any specific binding 
member which is immobilized upon (attached to) the solid support and which has the ability to 
immobilize the capture reagent through a specific binding reaction. The receptor molecule 
enables the indirect binding of the capture reagent to a solid support material before the 
performance of the assay or during the performance of the assay. The solid phase thus can be a 
plastic, derivatized plastic, magnetic or non-magnetic metal, glass or silicon surface of a test 
tube, microtiter well, sheet, bead, microparticle, chip, sheep (or other suitable animal's) red 
blood cells, duracytes and other configurations known to those of ordinary skill in the art. The 
polynucleotides of the invention can be attached to or immobilized on a solid support 
individually or in groups of at least 2, 5, 8, 10, 12, 15, 20, or 25 distinct polynucleotides of the 
invention to a single solid support. In addition, polynucleotides other than those of the 
invention may be attached to the same solid support as one or more polynucleotides of the 
invention. 

Consequently, the invention also comprises a method for detecting the presence of a 
nucleic acid comprising a nucleotide sequence selected from a group consisting of SEQ ID Nos 
1 to 26, 36 to 40 and 54 to 229, a fragment or a variant thereof or a complementary sequence 
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thereto in a sample, said method comprising the following steps of: 

a) bringing into contact a nucleic acid probe or a plurality of nucleic acid probes which 
can hybridize with a nucleotide sequence included in a nucleic acid selected form the group 
consisting of the nucleotide sequences of SEQ ID Nos. 1 to 26, 36 to 40 and 54 to 229, a 

5 fragment or a variant thereof or a complementary sequence thereto and the sample to be 

assayed; and 

b) detecting the hybrid complex formed between the probe and a nucleic acid in the 

sample. 

The invention further concerns a kit for detecting the presence of a nucleic acid 
10 comprising a nucleotide sequence selected from a group consisting of SEQ ID Nos. 1 to 26, 36 

to 40 and 54 to 229, a fragment or a variant thereof or a complementary sequence thereto in a 
sample, said kit comprising: 

a) a nucleic acid probe or a plurality of nucleic acid probes which can hybridize with a 
nucleotide sequence included in a nucleic acid selected form the group consisting of the 

1 5 nucleotide sequences of SEQ ID Nos. 1 to 26, 36 to 40 and 54 to 229, a fragment or a variant 

thereof or a complementary sequence thereto; and 

b) optionally, the reagents necessary for performing the hybridization reaction. 

In a first preferred embodiment of this detection method and kit, said nucleic acid probe 
or the plurality of nucleic acid probes are labeled with a detectable molecule. In a second 

20 preferred embodiment of said method and kit, said nucleic acid probe or the plurality of nucleic 

acid probes has been immobilized on a substrate. In a third preferred embodiment, the nucleic 
acid probe or the plurality of nucleic acid probes comprise either a sequence which is selected 
from the group consisting of the nucleotide sequences of PI to P360 and the complementary 
sequence thereto, Bl to B229, CI to C229, Dl to D360, El to E360, or a nucleotide sequence 

25 comprising a biallelic marker selected from the group consisting of Al to A360 or a 

polymorphism selected from the group consisting of A361 to A489, or the complements thereto. 
Oligonucleotide Arrays 

A substrate comprising a plurality of oligonucleotide primers or probes of the invention 
may be used either for detecting or amplifying targeted sequences in a nucleotide sequence of 

30 SEQ ID No. 1, more particularly in an sbgl, g34665, sbg2, g35017 or g3501 8 polynucleotide, 

or in genes comprising an sbgl, g34665, sbg2, g35017 or g35018 polynucleotide and may also 
be used for detecting mutations in the coding or in the non-coding sequences of an sbgl, 
g34665, sbg2, g35017 or g35018 nucleic acid sequence, or genes comprising an sbgl, g34665, 
sbg2, g35017 or g35018 nucleic acid sequence. 

35 Any polynucleotide provided herein may be attached in overlapping areas or at random 
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locations on the solid support. Alternatively the polynucleotides of the invention may be 
attached in an ordered array wherein each polynucleotide is attached to a distinct region of the 
solid support which does not overlap with the attachment site of any other polynucleotide. 
Preferably, such an ordered array of polynucleotides is designed to be "addressable" where the 
distinct locations are recorded and can be accessed as part of an assay procedure. Addressable 
polynucleotide arrays typically comprise a plurality of different oligonucleotide probes that are 
coupled to a surface of a substrate in different known locations. The knowledge of the precise 
location of each polynucleotides location makes these "addressable" arrays particularly useful 
m hybridization assays. Any addressable array technology known in the art can be employed 
with the polynucleotides of the invention. One particular embodiment of these polynucleotide 
arrays is known as Genechips™, and has been genera „ y descrjbed fa u§ f ^ ^ 

PCT publications WO 90/15070 and 92/10092. These arrays may generally be produced using 
mechanical synthesis methods or light directed synthesis methods which incorporate a 
combination of photolithographic methods and solid phase oligonucleotide synthesis (Fodor et 
■I, 1991, incorporated herein by reference). The immobilization of arrays of oligonucleotides 
on solid supports has been rendered possible by the development of a technology generally 
identified as "Very Large Scale Immobilized Polymer Synthesis" (VLSIPS™) in whichj 
typically, probes are immobilized in a high density array on a solid surface of a chip. Examples 
of VLSIPS™ technologies are provided in US Patents 5,143,854; and 5,412,087 and in PCT 
Publications WO 90/15070, WO 92/10092 and WO 95/1 1995, which describe methods for 
forming oligonucleotide arrays through techniques such as light-directed synthesis techniques 
In designing strategies aimed at providing arrays of nucleotides immobilized on solid supports 
further presentation strategies were developed to order and display the oligonucleotide arrays on 
the ch.ps in an attempt to maximize hybridization patterns and sequence information. Examples 
of such presentation strategies are disclosed in PCT Publications WO 94/12305 WO 94/1 1530 
WO 97/29212 and WO 97/3 1256. 

In another embodiment of the oligonucleotide arrays of the invention, an 
oligonucleotide probe matrix may advantageously be used to detect mutations occurring in an 
sbgl, g 34665, sbg2, g35017 or g35018 polynucleotide, including in genes comprising an sbgl 
g34665, sbg2, g35017 or g3501 8 polynucleotide and preferably in an sbgl, g 34665, sbg2 
g35017 or g3501 8 polynucleotide regulatory region. For this particular purpose, probes are 
specifically designed to have a nucleotide sequence allowing their hybridization to the genes 
that carry known mutations (either by deletion, insertion or substitution of one or several 
nucleotides). By known mutations in an sbgl, 8 34665, sbg2, g35017 or g35018 polynucleotide 
«t is meant, mutations in an sbgl , g 34665, sbg2, g350 1 7 or g350 1 8 polynucleotide that have 
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been identified according; the technique used by Huang et al.(1996) or Samson et al.(1996), for 
example, may be used to identify such mutations. 

Another technique that is used to detect mutations in an sbgl, g34665, sbg2, g35017 or 
g35018 polynucleotide is the use of a high-density DNA array. Each oligonucleotide probe 
5 constituting a unit element of the high density DNA array is designed to match a specific 

subsequence of an sbgl, g34665, sbg2, g3501 7 or g3501 8 polynucleotide. Thus, an array 
consisting of oligonucleotides complementary to subsequences of the target gene sequence is 
used to determine the identity of the target sequence with the wild-type gene sequence, measure 
its amount, and detect differences between the target sequence and the reference wild-type 

10 nucleic acid sequence of an sbgl, g34665, sbg2, g35017 or g35018 polynucleotide. In one such 

design, termed 4L tiled array, is implemented a set of four probes (A, C, G, T), preferably 15- 
nucleotide oligomers. In each set of four probes, the perfect complement will hybridize more 
strongly than mismatched probes. Consequently, a nucleic acid target of length L is scanned for 
mutations with a tiled array containing 4L probes, the whole probe set containing all the 

15 possible mutations in the known wild reference sequence. The hybridization signals of the 15- 

mer probe set tiled array are perturbed by a single base change in the target sequence. As a 
consequence, there is a characteristic loss of signal or a "footprint" for the probes flanking a 
mutation position. This technique was described by Chee et al. in 1996. 

Consequently, the invention concerns an array of nucleic acid molecules comprising at 

20 least one polynucleotide described above as probes and primers. Preferably, the invention 

concerns an array of nucleic acid comprising at least two polynucleotides described above as 
probes and primers. 

Sbgl, g34665, sbg2, g35017 and g35018 Proteins and Polypeptide Fragments: 

The terms " sbgl polypeptides ", " g34665 polypeptides ". " sbg2 polypeptides ". " g35017 
25 polypeptides ". " g35017 polypeptides " are used herein to embrace all of the proteins and 

polypeptides encoded by the respective sbgl, g34665, sbg2, g35017 and g35018 polypeptides 
of the present invention. Forming part of the invention are polypeptides encoded by the 
polynucleotides of the invention, as well as fusion polypeptides comprising such polypeptides. . 
The invention embodies proteins from humans, mammals, primates, non-human primates, and 
30 includes isolated or purified sbgl proteins consisting, consisting essentially, or comprising the 

sequence of SEQ ID Nos 27 to 35, isolated or purified g34665, g3501 7 and sbg2 proteins 
encoded by the g34665, g35017 and sbg2 polynucleotide sequence of SEQ ID No 1, and 
isolated or purified g3501 8 proteins consisting, consisting essentially, or comprising the 
sequence of SEQ ID Nos 4 1 to 43 . 
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It should be noted that the sbgl, g 34665, sbg2, g35017 and g350 18 proteins of the 
invention also comprise naturally-occurring variants of the amino acid sequence of the 
respective human sbgl , g34665, sbg2, g3501 7 and g3501 8 proteins. 

The present invention embodies isolated, purified, and recombinant polypeptides 
comprising a contiguous span of at least 4 amino acids, preferably at least 6, more preferably at 
least 8 to 10 amino acids, more preferably at least 12, 15, 20, 25, 30, 40, 50, or 100 amino acids, 
to the extent that said span is consistent with the length of a particular SEQ ID, of SEQ ID Nos 
27 to 35 and 41 to 43. In other preferred embodiments the contiguous stretch of amino acids 
comprises the site of a mutation or functional mutation, including a deletion, addition, swap or 
truncation of the amino acids in an sbgl, g 34665, sbg2, g350I7 and g35018 protein sequence. 

The invention also embodies isolated, purified, and recombinant sbgl polypeptides 
comprising a contiguous span of at least 4 amino acids, preferably at least 6 or at least 8 to 1 0 
amino acids, more preferably at least 12, 15, 20, 25, 30, 40, 50, or 100 amino acids of SEQ ID 
Nos 27 to 35, wherein said contiguous span comprises an amino acid variation according to 
Table Se. 

The present inventors have fiirther identified potential cleavage sites in the sbgl 
polypeptides, and several specific sbgl peptides. An sbgl peptide has fiirther been tested in 
behavioral studies by injection in mice, as further detailed in Example 7. In particular, the 
polypeptide of SEQ ID No 29 contains a protease cleavage site at amino acid positions 62 to 63- 
the polypeptide of SEQ ID No 30 contains a protease cleavage site at amino acid positions 63 to 
64 and 1 10 to 1 11 ; the polypeptide of SEQ ID No 32 contains a protease cleavage site at amino 
acid positions 63 to 64; the polypeptide of SEQ ID No 33 contains a protease cleavage site at 
amino acid positions 54 to 55 and 57 to 58; the polypeptide of SEQ ID No 34 contains a 
protease cleavage site at amino acid positions 63 to 64 and 122 to 123; and the polypeptide of 
SEQ ID No 35 contains a protease cleavage site at amino acid positions 62 to 63 and 63 to 64. 
Additionally, sbgl polypeptides of SEQ ID Nos 30, 32 and 34 contain cysteine residues 
predicted to be capable of forming a disulfide bridge at amino acid positions 82 and 104 of SEQ 
ID No 30, amino acid positions 82 and 106 and SEQ ID No 32, and amino acid positions 132 
and 142 of SEQ ID No 34. In particularly preferred embodiment, the invention comprises 
isolated, purified, and recombinant sbgl peptides comprising a contiguous span of at least 4 
amino acids, preferably at least 6 or at least 8 to 10 amino acids, more preferably at least 12 or 
15 amino acids of an amino acid position range selected from the group consisting of amino 
acid positions: 1 to 63 and 64 to 1 02 of SEQ ID No 29; 1 to 64, 65 to 1 1 1 and 1 1 2 to 1 1 9 of 
SEQ ID No 30; 1 to 64 and 65 to 126 of SEQ ID No 32; 1 to 64, 65 to 123 and 124 to 153 of 
SEQ ID No 34; and 1 to 61 and 65 to 106 of SEQ ID No 35. 
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The invention further embodies sbgl, g34665, sbg2 5 g35017 and g35018 polypeptides, 
including isolated and recombinant polypeptides, encoded respectively by sbgl, g34665, sbg2, 
g35017 and g35018 polynucleotides consisting, consisting essentially, or comprising a 
contiguous span of at least 12, 15, 18, 20, 25, 30, 35, 40, 50, 60, 70, 80, 90, 100, 150, 200 or 
5 500 nucleotides, to the extent that the length of said span is consistent with the nucleotide 

position range, of SEQ ID No 1, wherein said contiguous span comprises at least 1, 2, 3, 4, 5, 7 
or 10 of the following nucleotide positions of SEQ ID No ! : 

(a) 290653 to 292652, 292653 to 296047, 292653 to 292841, 295555 to 296047 and 

295580 to 296047; 
10 (b) 94144 to 94964 

(c) 1 108 to 65853, 1 108 to 1289, 14877 to 14920, 18778 to 18862, 25593 to 25740, 
29388 to 29502, 29967 to 30282, 64666 to 64812, and 65505 to 65853; 

(d) 215819 to 215941, 215819 to 215975, 216661 to 216952, 216661 to 217061, 
217027 to 217061, 229647 to 229742, 230408 to 230721, 231272 to 231412, 231787 to 

15 231880, 231870 to 231879, 234174 to 234321, 237406 to 237428, 239719 to 239807, 239719 

to 239853, 240528 to 240569, 240528 to 240596, 240528 to 240617, 240528 to 240644, 
240528 to 240824, 240528 to 240994, 240528 to 241685 and 240800 to 240993; 

(e) 201 188 to 21691 5, 201 188 to 201234, 214676 to 214793, 215702 to 215746 and 
2 1 6836 to 2 1 69 1 5; or the complements thereof. 

20 The present invention further embodies isolated, purified, and recombinant 

polypeptides encoded by an sbgl polynucleotide or gene comprising at least one sbgl 
nucleotide sequence selected from the group consisting of the following sbgl exons: MSI, Ml, 
M692, M862, MS2, Ml 069, Ml 090, Ml 1 17, N , N2, Nbis, O, Ol, 02, Obis, P, X, Ql, Q, Qbis, 
R and Rbis. 

25 The invention also encompasses a purified, isolated, or recombinant polypeptides 

comprising an amino acid sequence having at least 70, 75, 80, 85, 90, 95, 98 or 99% amino acid 
identity with the amino acid sequence of SEQ ID Nos 27 to 35 and 41 to 43 or a fragment 
thereof. 

Sbgl, g34665, sbg2, g35017 and g35018 proteins are preferably isolated from human or 
30 mammalian tissue samples or expressed from human or mammalian genes. The sbgl , g34665, 

sbg2, g3 50 17 and g3 50 18 polypeptides of the invention can be made using routine expression 
methods known in the art. The polynucleotide encoding the desired polypeptide, is ligated into 
an expression vector suitable for any convenient host. Both eukaryotic and prokaryotic host 
systems is used in forming recombinant polypeptides, and a summary of some of the more 
35 common systems. The polypeptide is then isolated from lysed cells or from the culture medium 
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and purified to the extent needed for its intended use. Purification is by any technique known in 
the art, for example, differential extraction, salt fractionation, chromatography, centrifugation, 
and the like. See, for example, Methods in Enzymology for a variety of methods for purifying 
proteins. 

In addition, shorter protein fragments can be produced by chemical synthesis. 
Alternatively the proteins of the invention is extracted from cells or tissues of humans or non- 
human animals. Methods for purifying proteins are known in the art, and include the use of 
detergents or chaotropic agents to disrupt particles followed by differential extraction and 
separation of the polypeptides by ion exchange chromatography, affinity chromatography, 
sedimentation according to density, and gel electrophoresis. 

Any sbgl, g34665, sbg2, g35017 or g35018 cDNA or fragment thereof, including the 
respective cDNA sequences of SEQ ID Nos 2 to 26 and 36 to 40 is used to express sbgl , g34665, 
sbg2, g35017 or g3501 8 proteins and polypeptides. The nucleic acid encoding the sbgl, g34665, 
sbg2, g35017 or g3S018 protein or polypeptide to be expressed is operably linked to a promoter in 
an expression vector using conventional cloning technology. The sbgl, g34665, sbg2, g35017 or 
g3S01 8 insert in the expression vector may comprise the full coding sequence for the respective 
sbgl, g34665, sb g 2, g35017 or g35018 protein or a portion thereof. For example, the sbgl or 
g3501 8 derived insert may encode a polypeptide comprising at least 10 consecutive amino acids of 
the respective sbgl or g35018 protein of SEQ ID Nos 27 to 35 and 41 to 43. 

The expression vector is any of the mammalian, yeast, insect or bacterial expression 
systems known in the art. Commercially available vectors and expression systems are available 
from a variety of suppliers including Genetics Institute (Cambridge, MA), Stratagene (La Jolla, 
California), Promega (Madison, Wisconsin), and Invitrogen (San Diego, California). If desired, to 
enhance expression and facilitate proper protein folding, the codon context and codon pairing of the 
sequence is optimized for the particular expression organism in which the expression vector is 
introduced, as explained by Hatfield, et al., U.S. Patent No. 5,082,767. 

In one embodiment, the entire coding sequence of the sbgl , g34665, sbg2, g3501 7 or 
g3501 8 cDNA through the poly A signal of the cDNA are operably linked to a promoter in the 
expression vector. Alternatively, if the nucleic acid encoding a portion of the sbgl , g34665, sbg2, 
g3501 7 or g3501 8 protein lacks a methionine to serve as the initiation site, an initiating methionine 
can be introduced next to the first codon of the nucleic acid using conventional techniques. 
Similarly, if the insert from the sbgl, g34665, sb g 2, g35017 or g3501 8 cDNA lacks a poly A 
signal, this sequence can be added to the construct by, for example, splicing out the Poly A signal 
from P SG5 (Stratagene) using Bgll and Sail restriction endonuclease enzymes and incorporating it 
into the mammalian expression vector pXTl (Stratagene). pXTl contains the LTRs and a portion 
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of the gag gene from Moloney Murine Leukemia Virus. The position of the LTRs in the construct 
allow efficient stable transfection. The vector includes the Herpes Simplex Thymidine Kinase 
promoter and the selectable neomycin gene. The nucleic acid encoding the sbgl, g34665, sbg2, 
g3501 7 or g3501 8 protein or a portion thereof is obtained by PCR from a bacterial vector 
containing the a nucleotide sequence of an exon of an sbgl , g34665, sbg2, g350 1 7 or g3501 8 
gene as described herein and in SEQ ID No 1, or from an sbgl or g3501 8 cDNA comprising a 
nucleic acid of SEQ ID No 2 to 26 and 36 to 40 using oligonucleotide primers complementary' to 
the sbgl, g34665, sbg2, g35017 or g3501 8 nucleic acid or portion thereof and containing 
restriction endonuclease sequences for Pst 1 incorporated into the 5' primer and Bglll at the 5' end 
of the corresponding cDNA 3' primer, taking care to ensure that the sequence encoding the sbgl , 
g34665, sbg2, g35017 or g3501 8 protein or a portion thereof is positioned properly with respect to 
the poly A signal. The purified fragment obtained from the resulting PCR reaction is digested with 
PstI, blunt ended with an exonuclease, digested with Bgl II, purified and ligated to pXTl, now 
containing a poly A signal and digested with Bglll. 

The ligated product is transfected into mouse NTH 3T3 cells using Lipofectin (Life 
Technologies, Inc., Grand Island, New York) under conditions outlined in the product 
specification. Positive transfectants are selected after growing the transfected cells in 600ug/ml 
G4 18 (Sigma, St. Louis, Missouri). 

Alternatively, the nucleic acids encoding the sbgl, g34665, sbg2, g35017 or g350 1 8 
protein or a portion thereof is cloned into pED6dpc2 (Genetics Institute, Cambridge, MA). The 
resulting pED6dpc2 constructs is transfected into a suitable host cell, such as COS 1 cells. 
Methotrexate resistant cells are selected and expanded. 

The above procedures may also be used to express a mutant sbgl , g34665, sbg2, g3501 7 
or g35018 protein responsible for a detectable phenotype or a portion thereof. 

The expressed proteins are purified using conventional purification techniques such as 
ammonium sulfate precipitation or chromatographic separation based on size or charge. The 
protein encoded by the nucleic acid insert may also be purified using standard 
immunochromatography techniques. In such procedures, a solution containing the expressed sbgl , 
g34665, sbg2, g35017 or g3501 8 protein or portion thereof, such as a cell extract, is applied to a 
column having antibodies against the sbgl , g34665, sbg2, g35017 or g3501 8 protein or portion 
thereof is attached to the chromatography matrix. The expressed protein is allowed to bind the 
immunochromatography column. Thereafter, the column is washed to remove non-specifically 
bound proteins. The specifically bound expressed protein is then released from the column and 
recovered using standard techniques. 

To confirm expression of the sbgl, g34665, sbg2, g3501 7 or g3501 8 protein or a portion 
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thereof, the proteins expressed from host cells containing an expression vector containing an insert 
encoding the sbgl, g34665, sbg2, g3501 7 or g3501 8 protein or a portion thereof can be compared 
to the proteins expressed in host cells containing the expression vector without an insert. The 
presence of a band in samples from cells containing the expression vector with an insert which is 
absent in samples from cells containing the expression vector without an insert indicates that the 
sbgl, g 34665, sbg2, g3501 7 or g35018 protein or a portion thereof is being expressed. Generally, 
the band will have the mobility expected for the sbgl, g34665, sbg2, g35017 or g35018 protein or 
portion thereof. However, the band may have a mobility different than that expected as a result of 
modifications such as glycosylation, ubiquitination, or enzymatic cleavage. 

Antibodies capable of specifically recognizing the expressed sbgl, g 34665, sb g 2, g35017 
or g350 1 8 protein or a portion thereof are described below. 

If antibody production is not possible, the nucleic acids encoding the sbgl, g34665, sbg2, 
g3501 7 or g3501 8 protein or a portion thereof is incorporated into expression vector designed for 
use in purification schemes employing chimeric polypeptides. In such strategies the nucleic acid 
encoding the sbgl, g34665, sbg2, g35017 or g35018 protein or a portion thereof is inserted in 
frame with the gene encoding the other half of the chimera. The other half of the chimera is P- 
globin or a nickel binding polypeptide encoding sequence. A chromatography matrix having . 
antibody to P-globin or nickel attached thereto is then used to purify the chimeric protein. Protease 
cleavage sites is engineered between the p-globin gene or the nickel binding polypeptide and the 
sbgl, g 34665, sbg2, g3501 7 or g 3501 8 protein or portion thereof. Thus, the two polypeptides of 
the chimera is separated from one another by protease digestion. 

One useful expression vector for generating p-globin chimeric proteins is pSG5 
(Stratagene), which encodes rabbit P-globin. Intron II of the rabbit p-globin gene facilitates 
splicing of the expressed transcript, and the polyadenylation signal incorporated into the construct 
increases the level of expression. These techniques are well known to those skilled in the art of 
molecular biology. Standard methods are published in methods texts such as Davis et al., (1986) 
and many of the methods are available from Stratagene, Life Technologies, Inc., or Promega. 
Polypeptide may additionally be produced from the construct using in vitro translation systems 
such as the In vitro Express™ Translation Kit (Stratagene). 

Antibodies That Bind sbgl, g34665, sbg2, g35017 or g35018 Polypeptides of the 
Invention 

Any sbgl, g34665, sbg2, g35017 or g3501 8 polypeptide or whole protein may be used 
to generate antibodies capable of specifically binding to an expressed sbgl, g34665, sbg2, 
g3501 7 and g350 1 8 protein or fragments thereof. 
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For an antibody composition to specifically bind to an sbgl, g34665, sbg2, g35017 or 
g350l8 protein, it must demonstrate at least a 5%, 10%, 15%, 20%, 25%, 50%, or 100% greater 
binding affinity for full length sbgl, g34665, sbg2, g3501 7 or g35018 protein than for any full 
* length protein in an ELISA, RJ A, or other antibody-based binding assay. For an antibody 
composition to specifically bind to a variant sbgl, g34665, sbg2, g35017 or g35018 protein, it 
must demonstrate at least a 5%, 10%, 15%, 20%, 25%, 50%, or 100% greater binding affinity 
for the respective full length variant sbgl, §34665, sbg2, §3501 7 or §35018 protein than for the 
respective reference sbgl, g34665, sbg2, g35017 or g35018 full length protein in an ELISA, 
RIA, or other antibody-based binding assay. 

One antibody composition of the invention is capable of specifically binding or 
specifically binds to the respective sbgl org3501 8 proteins of SEQ ID Nos 27 to 35 and 41 to 
43. Other antibody compositions of the invention are capable of specifically binding or 
specifically bind to an sbgl, sbg2 or g35018 protein variant. Optionally said sbgl protein 
variant may be a natural variant provided in Tables 5d or 5e. 

In one embodiment, the invention concerns antibody compositions, either polyclonal or 
monoclonal, capable of selectively binding, or selectively bind to an epitope-containing a 
polypeptide comprising a contiguous span of at least 6 amino acids, preferably at least 8 to 10 
amino acids, more preferably at least 12, 15, 20, 25, 30, 40, 50, or 100 amino acids of an sbgl, 
g34665, sbg2, g35017 or g3 50 18 polypeptide. 

The invention also concerns a purified or isolated antibody capable of specifically 
binding to a mutated sbgl, §34665, sbg2, g3501 7 or g3501 8 protein or to a fragment or variant 
thereof comprising an epitope of the mutated sbgl, g34665, sbg2, g35017 or g35018 protein. In 
another preferred embodiment, the present invention concerns an antibody capable of binding to 
a polypeptide comprising at least 10 consecutive amino acids of an sbgl, g34665, sb§2, g3501 7 
or g3501 8 protein and including at least one of the amino acids which can be encoded by the 
trait causing mutations. 

In a preferred embodiment, the invention concerns the use in the manufacture of 
antibodies of a polypeptide comprising a contiguous span of at least 6 amino acids, preferably at 
least 8 to 10 amino acids, more preferably at least 12, 15, 20, 25, 30, 40, 50, or 100 amino acids 
of any of SEQ ID Nos 27 to 3 5 and 4 1 to 43 . 

Non-human animals, and more particularly non-human mammals and non-human 
primates, whether wild-type or transgenic, which express a different species of sbgl, g34665, 
sbg2, g35017 or g35018 than the one to which antibody binding is desired, and animals which 
do not express sbgl, g34665, sbg2, g3501 7 or g35018 (i.e. an sbgl, g34665, sbg2, g35017 or 
g3501 8 knock out animal as described in herein) are particularly useful for preparing antibodies. 
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sbgl , g 34665, sbg2, g3501 7 or g3501 8 knock out animals will recognize all or most of the 
exposed regions of an sbgl, g34665, sbg2, g3501 7 or g 35018 protein as foreign antigens, and 
therefore produce antibodies with a wider array of sbgl, g 34665, sbg2, g3501 7 or g3501 8 
epitopes. Moreover, smaller polypeptides with only 10 to 30 amino acids may be useful in 
obtaining specific binding to any one of the sbgl, g 34665, sbg2, §35017 or g35018 proteins. In 
addition, the humoral immune system of animals which produce a species of sbgl, g34665, 
sbg2, g3501 7 or g3501 8 that resembles the antigenic sequence will preferentially recognize the 
differences between the animal's native sbgl, g34665, sbg2, §35017 or g35018 species and the 
antigen sequence, and produce antibodies to these unique sites in the antigen sequence.. Such a 
technique will be particularly useful in obtaining antibodies that specifically bind to any one of 
the sbgl, g34665, sbg2, g35017 or g350 18 proteins. 

Antibody preparations prepared according to either protocol are useful in quantitative 
immunoassays which determine concentrations of antigen-bearing substances in biological 
samples; they are also used semi-quantitatively or qualitatively to identify the presence of antigen 
in a biological sample. 

The antibodies may also be used in therapeutic compositions for killing cells expressing 
the protein or reducing the levels of the protein in the body. Thus in one embodiment, the invention 
comprises the use of an antibody capable of specifically recognizing sbgl, g34665, sbg2, g35017 
or g3501 8 for the treatment of schizophrenia or bipolar disorder. 

The antibodies of the invention may be labeled by any one of the radioactive, fluorescent 
or enzymatic labels known in the art. 

Consequently, the invention is also directed to a method for detecting specifically the 
presence of an sbgl, g34665, sbg2, g35017 or g350. 8 polypeptide according to the invention in 
a biological sample, said method comprising the following steps: 

a) bringing into contact the biological sample with a polyclonal or monoclonal antibody 
that specifically binds an sbgl, g34665, sbg2, g35017 or g35018 polypeptide, or to a peptide 
fragment or variant thereof; and 

b) detecting the antigen-antibody complex formed. 

The invention also concerns a diagnostic kit for detecting in vitro the presence of an 
sbgl, g34665, sbg2, g35017 or g35018 polypeptide according to the present invention in a 
biological sample, wherein said kit comprises: 

a) a polyclonal or monoclonal antibody that specifically binds an sbgl, g34665, sbg2, 
g35017 or g3501 8 polypeptide, or to a peptide fragment or variant thereof, optionally labeled! 

b) a reagent allowing the detection of the antigen-antibody complexes formed, said 
reagent carrying optionally a label, or being able to be recognized itself by a labeled reagent 



WO 00/585 ! 0 PCT/IBOO/00435 

87 

more particularly in the case when the above-mentioned monoclonal or polyclonal antibody is 
not labeled by itself. 



Biallelic markers of the inventions 

Advantages of the biallelic markers of the present invention 
The biallelic marker of the inventions of the present invention offer a number of 
important advantages over other genetic markers such as RFLP (Restriction fragment length 
polymorphism) and VNTR (Variable Number of Tandem Repeats) markers. 

The first generation of markers, were RFLPs, which are variations that modify the 
length of a restriction fragment. But methods used to identify and to type RFLPs are relatively 
wasteful of materials, effort, and time. The second generation of genetic markers were VNTRs, 
which can be categorized as either minisatellites or microsatellites. Minisatellites are tandemly 
repeated DNA sequences present in units of 5-50 repeats which are distributed along regions of 
the human chromosomes ranging from 0.1 to 20 kilobases in length. Since they present many 
possible alleles, their informative content is very high. Minisatellites are scored by performing 
Southern blots to identify the number of tandem repeats present in a nucleic acid sample from 

4 

the individual being tested. However, there are only 10 potential VNTRs that can be typed by 
Southern blotting. Moreover, both RFLP and VNTR markers are costly and time-consuming to 
develop and assay in large numbers. 

Single nucleotide polymorphism or biallelic markers can be used in the same manner as 
RFLPs and VNTRs but offer several advantages. Single nucleotide polymorphisms are densely 
spaced in the human genome and represent the most frequent type of variation. An estimated 

number of more than 1 0^ sites are scattered along the 3x1 0 9 base pairs of the human genome. 
Therefore, single nucleotide polymorphism occur at a greater frequency and with greater 
uniformity than RFLP or VNTR markers which means that there is a greater probability that 
such a marker will be found in close proximity to a genetic locus of interest. Single nucleotide 
polymorphisms are less variable than VNTR markers but are mutationally more stable. 

Also, the different forms of a characterized single nucleotide polymorphism, such as the 
biallelic markers of the present invention, are often easier to distinguish and can therefore be 
typed easily on a routine basis. Biallelic markers have single nucleotide based alleles and they 
have only two common alleles, which allows highly parallel detection and automated scoring. 
The biallelic markers of the present invention offer the possibility of rapid, high-throughput 
genotyping of a large number of individuals. 

Biallelic markers are densely spaced in the genome, sufficiently informative and can be 
assayed in large numbers. The combined effects of these advantages make biallelic markers 
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extremely vaJuable in genetic studies. Biallelic markers can be used in linkage studies in 
families, in allele sharing methods, in linkage disequilibrium studies in populations, in 
association studies of case-control populations. An important aspect of the present invention is 
that biallelic markers allow association studies to be performed to identify genes involved in 
complex traits. Association studies examine the frequency of marker alleles in unrelated case- 
and control-populations and are generally employed in the detection of polygenic or sporadic 
traits. Association studies may be conducted within the general population and a re not limited 
to studies performed on related individuals in affected families (linkage studies). Biallelic 
markers in different genes can be screened in parallel for direct association with disease or 
response to a treatment. This multiple gene approach is a powerful tool for a variety of human 
genefc studies as it provides the necessary statistical power to examine the synergistic effect of 
mulfple genetic factors on a particular phenotype, drug response, sporadic trait, or disease state 
with a complex genetic etiology. 

Polymorphisms, Bialleli c Markers An d PolvnuH^tiH^ Comnri.ino tk.„ 
Polynucleotides of the present invention 

In one aspect, the invention concerns biallelic markers associated with schizophrenia 
The invention comprises chromosome 13q3 l-q33-related biallelic markers, region D-related 
b,allelic markers, sbgl-related biallelic markers, g34665-related biallelic markers, sbg2-re.ated 
b,allel,c markers, g35017-related biallelic markers and g35018-related biallelic markers. The 
markers and polymorphisms are generally referred to herein as Al, A2, A3 and so on The 
polymorphisms and biallelic markers of the invention comprise the biallelic markers designated 
Al to A360 in Table 6b. The polymorphisms of the invention also comprise the polymorphisms 
designated A361 to A489 in Table 6c. Also included are biallelic markers in linkage 
disequilibrium with the biallelic markers of the invention. 

Details of chromosome 13q31-q33-related biallelic markers on the subregions 
designated Region D including subregions thereof designated Regions Dl, D2 ,D3 and D4 and 
adjacent regions referred to as Region E and Region G are shown below and in Tables 6B and 
6c. Regions D, G and E of the chromosome 13q3 l-q33 locus are also shown in Figure 2. 
References to the corresponding SEQ ID number, to alternative marker designations and 
positions of the sequence features within the SEQ ID are given in Tables 6b and 6c for biallelic 
markers Al to A242 and 361 to 489 located in Region D3 and D4. Further biallelic markers 
from the group designated A243 to A360 in Tables 6b and 6c are located in Regions Dl D2 G 
and E. The relative positions of biallelic markers on Region G and E are further detailed'below 
m Table 5g; the relative positions of biallelic markers on Region Dl and D2 are further detailed 
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Table 5g 



Biallelic 
marker 


Region E biallelic 
markers 


Position on 
contig 




Biallelic 
marker 


Region G biallelic 
markers 


Position on contig 


A311 


99-26171-71 


20778 




A359 


99-27912-272 


153458 


A333 


99-26173-470 


22456 




A322 


1 OQ-7£7^4-T5* 


210058 


A308 


99-26166-257 


24731 




A267 


99-15672-166 


266449 


A310 


99-26169-211 


31620 




A283 


99-25917-1 15 


268222 


A312 


99-26183-156 


35869 




A266 


99-15668-139 


778477 


A309 


99-26167-278 


43220 




A282 


99-25906-131 




A78 


99-20978-89 


51405 




A265 


99-15665-398 


JUU7ZU 


A275 


99-20983-48 


65076 




A264 


99-15664-18^ 


1 1 1 7^ 1 f 
-3 1 1ZD 1 


A272 


99-20977-72 


70519 




A268 


99-15682-318 


J] J//U 


A274 


99-20981-300 


94914 




A271 


99-20933-81 




A327 


99-6080-99 


134366 




A323 


99-26238-186 


"70 


A325 


99-5912-49 


149345 




A302 


99-26146-264 




A252 


99-15229-412 


154582 




A321 


99-26233-275 




A276 


99-22310-148 


161605 




A279 


99-25869-182 




A254 


99-15232-291 


162153 




A317 


99-26222-149 


391049 


A247 


99-14021-108 


164660 




A301 


99-26138-193 


400078 


A300 


99-26126-498 


170445 




A318 


99-26223-225 


405361 


A329 


99-7337-204 


198083 




A319 


99-26225-148 


416529 


A243 


8-94-252 


206618 




A284 


99-25924-215 


421281 


A253 


99-15231-219 


212050 




A320 


99-26228-172 


427201 


A246 


8-98-68 


213871 




A280 


99-25881-275 


435974 


A245 


8-97-98 


215017 




A281 


99-25897-264 


440452 


A326 


99-6012-220 


216597 




A337 


99-26769-256 


471739 


A255 


99-15239-377 


223699 




A338 


99-26772-268 


483511 


A244 


8-95-43 


236882 




A3 39 


99-26776-209 


494003 


A328 


99-7308-157 


239008 




A340 


99-26779-437 


505947 


A248 


99-14364-415 


255729 




A341 


99-26781-25 


514635 








A342 


99-26782-300 


516212 








A343 


99-26783-81 


519187 








A344 


99-26787-96 


529412 








A345 


99-26789-201 


540145 








A3 16 


99-26201-267 


584018 
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A3 15 


99-26191-58 


601044 


A3 14 


99-26190-20 


602591 


A3J3 


99-26189-164 


603145 


A277 


99-25029-241 


727473 


A336 


99-26559-315 


740802 



Biallelic 
marker 


Region Dl biallelic 
markers 


Position on 
contig 


A357 


99-27365/421 


48742 


A356 


99-27361/181 


54932 


A257 


99-15253/382 


56599 


A355 


99-27360/142 


57371 


A251 


99-15065/85 


61002 


A346 


99-27297/280 


61855 


A262 


99-15355/150 


62749 


A324 


99-5873/159 


64700 


A261 


99-15280/432 


76977 


A347 


99-27306/108 


92355 


A249 


99-15056/99 


93854 


A258 


99-15256/392 


98336 


A349 


99-27323/372 


100260 


A260 


99-15261/202 


101114 


A250 


99-15063/155 


105587 


A259 ! 


59-15258/337 


110395 


A348 < 


59-27312/58 


117521 


A351 < 


J9-27345/189 


134904 


A352 < 


J9-27349/267 


138974 


A353 i 


>9-27352/197 


141065 


A354 S 


'9-27353/105 


141494 



Table 5h 



The polynucleotide of the invention may consist of, consist essentially of, or comprise a 
confguous span of nucleotides of a sequence from any of SEQ ID Nos. 1 to 26, 36 to 40 and 54 
to 229 as well as sequences which are complementary thereto ("complements thereof) The 
"cont,guouss P an»maybeatlea S t8, 10, 12, 15, 18,20,25,35,40,50,70,80, 100 250 500 



Rio 1 1 a I ■ *» 

murKer 


Region D2 biallelic 
markers 


Position on contig 




99-26150/276 


168065 




99-26156/290 


173255 




99-26154/107 


175557 




99-26 1 53/44 


177194 


A7QO 


f\C\ ^\ £ f\ r» It n m 

99-25985/194 


186447 


A9Q9 


99-25974/143 


190018 




yy-zo284/394 


193065 


A3 03 


vy-^o 14 //39o 


196922 


A285 


7?-ZjyjU/ 1 Z \ 


205288 


A294 


99-25978/166 


215025 


A293 


99-25977/311 


216394 " 


A291 


99-25972/317 


224712 


A297 


99-25984/312 


230966 


A287 


99-25965/399 


236799 ^ 


A286 


99-25961/376 


244955 


A288 


99-25966/241 


254680 


A350 


99-27335/191 


25486 


A289 


99-25967/57 


257662 


A290 


99-25969/200 


261166 


A296 


99-25980/173 


261957 


A295 


99-25979/93 


263848 


A299 


99-25989/398 


269515 


A334 


99-26267/524 


275710 
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1000 or 2000 nucleotides in length, to the extent that a contiguous span of these lengths is 
consistent with the lengths of the particular Sequence ID. 

The present invention encompasses polynucleotides for use as primers and probes in the 
methods of the invention. These polynucleotides may consist of, consist essentially of, or 
comprise a contiguous span of nucleotides of a sequence from any of SEQ ID Nos. 1 to 26, 36 
to 40 and 54 to 229 as well as sequences which are complementary thereto ("complements 
thereof). The "contiguous span" may be at least 8, 10, 12, 15, 18, 20, 25, 35, 40, 50, 70, 80, 
100, 250, 500 , 1000 or 2000 nucleotides in length, to the extent that a contiguous span of these 
lengths is consistent with the lengths of the particular Sequence ID. It should be noted that the 
polynucleotides of the present invention are not limited to having the exact flanking sequences 
surrounding the polymorphic bases which, are enumerated in the Sequence Listing. Rather, it 
will be appreciated that the flanking sequences surrounding the biallelic markers and other 
polymorphisms of the invention, or any of the primers of probes of the invention which, are 
more distant from the markers, may be lengthened or shortened to any extent compatible with 
their intended use and the present invention specifically contemplates such sequences. It will be 
appreciated that the polynucleotides of SEQ ID Nos. 1 to 26, 36 to 40 and 54 to 229 may be of 
any length compatible with their intended use. Also the flanking regions outside of the 
contiguous span need not be homologous to native flanking sequences which actually occur in 
human subjects. The addition of any nucleotide sequence, which is compatible with the 
nucleotides intended use is specifically contemplated. The contiguous span may optionally 
include the biallelic markers of the invention in said sequence. Biallelic markers generally 
comprise a polymorphism at one single base position. Each biallelic marker therefore 
corresponds to two forms of a polynucleotide sequence which, when compared with one 
another, present a nucleotide modification at one position. Usually, the nucleotide modification 
involves the substitution of one nucleotide for another. Optionally allele 1 or allele 2 of the 
biallelic markers disclosed in Table 6b may be specified as being present at the biallelic marker 
of the invention. The contiguous span may optionally include a nucleotide at a polymorphism 
position described in Table 6c, including single nucleotide substitutions, deletions as well as 
multiple nucleotide deletions. The polymorphisms of Table 6c have not been validated as 
biallelic markers, but are expected to be mostly biallelic and may also be referred to as biallelic 
markers herein. Optionally, allele 1 or allele 2 of the polymorphisms of Table 6c may be 
specified as being present at the polymorphism of the invention. Preferred polynucleotides may 
consist of, consist essentially of, or comprise a contiguous span of nucleotides of a sequence 
from SEQ ID Nos. 1 to 26, 36 to 40 and 54 to 229 as well as sequences which are 
complementary thereto. The "contiguous span" may be at least 8, 10, 12, 15, 18, 20, 25, 35, 40, 
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50, 70, 80, 100, 250, 500, 1000 or 2000 nucleotides in length, to the extent that a contiguous 
span of these lengths is consistent with the lengths of the particular Sequence ID. 

A preferred probe or primer comprises a nucleic acid comprising a polynucleotide 
selected from the group of the nucleotide sequences of PI to P360 and the complementary 
sequence thereto, Bl to B229, CI to C229, Dl to D360, El to E360. 

The invention also relates to polynucleotides that hybridize, under conditions of high or 
intermediate stringency, to a polynucleotide of any of SEQ ID Nos. 1 to 26, 36 to 40 and 54 to 
229 as well as sequences, which are complementary thereto. Preferably such polynucleotides 
are at least 20, 25, 35, 40, 50, 70, 80, 1 00, 250, 500 , 1 000 or 2000 nucleotides in length, to the 
extent that a polynucleotide of these lengths is consistent with the lengths of the particular 
Sequence ID. Preferred polynucleotides comprise a polymorphism of the invention. Optionally 
either allele 1 or allele 2 of the polymorphism disclosed in Table 6c may be specified as being 
present at the polymorphism of the invention. Particularly preferred polynucleotides comprise a 
biallelic marker of the invention. Optionally either allele 1 or allele 2 of the biallelic markers 
disclosed in Table 6b may be specified as being present at the biallelic marker of the invention. 
Conditions of high stringency are further described herein. 

The primers of the present invention may be designed from the disclosed sequences for 
any method known in the art. A preferred set of primers is fashioned such that the 3' end of the 
contiguous span of identity with the sequences of any of SEQ ID Nos. 1 to 26, 36 to 40 and 54 
to 229 is present at the 3 1 end of the primer. Such a configuration allows the 3 1 end of the 
primer to hybridize to a selected nucleic acid sequence and dramatically increases the efficiency 
of the primer for amplification or sequencing reactions. In a preferred set of primers the 
contiguous span is found in one of the sequences described in Table 6a. Allele specific primers 
may be designed such that a biallelic marker or other polymorphism of the invention is at the 3' 
end of the contiguous span and the contiguous span is present at the 3' end of the primer. Such 
allele specific primers tend to selectively prime an amplification or sequencing reaction so long 
as they are used with a nucleic acid sample that contains one of the two alleles present at said 
marker. The 3' end of primer of the invention may be located within or at least 2, 4, 6, 8, 10, 
12, 15, 1 8, 20, 25, 50, 100, 250, 500, or 1000 nucleotides upstream of a biallelic marker of the 
invention in said sequence or at any other location which is appropriate for their intended use in 
sequencing, amplification or the location of novel sequences or markers. Primers with their 3' 
ends located 1 nucleotide upstream of an biallelic marker of the invention have a special utility 
as microsequencing assays. Preferred microsequencing primers are described in Table 6d. 

The probes of the present invention may be designed from the disclosed sequences for 
any method known in the art, particularly methods which allow for testing if a particular 
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sequence or marker disclosed herein is present. A preferred set of probes may be designed for 
use in the hybridization assays of the invention in any manner known in the art such that they 
selectively bind to one allele of a biallelic marker or other polymorphism, but not the other 
under any particular set of assay conditions. Preferred hybridization probes may consists of, 
consist essentially of, or comprise a contiguous span which ranges in length from 8, 10, 12, 15, 
1 8 or 20 to 25, 35, 40, 50, 60, 70, or 80 nucleotides, or be specified as being 12, 1 5, 1 8, 20, 25, 
35, 40, or 50 nucleotides in length and including an biallelic marker or other polymorphism of 
the invention in said sequence. In a preferred embodiment, either of allele 1 or 2 disclosed in 
Table 6b or 6c may be specified as being present at the biallelic marker site. In another 
preferred embodiment, said biallelic marker may be within 6, 5, 4, 3, 2, or 1 nucleotides of the 
center of the hybridization probe or at the center of said probe. 

In one embodiment the invention encompasses isolated, purified, and recombinant 
polynucleotides comprising, consisting of, or consisting essentially of a contiguous span of 8 to 
50 nucleotides of any one of SEQ ID IMos 1 to 26, 36 to 40 and 54 to 229 and the complement 
thereof, wherein said span includes a polymorphism of the invention, a chromosome 13q31- 
q33-related biallelic marker, region D-related biallelic marker, or sbgl-, g34665-, sbg2-, 
g35017- or g3501 8 -related biallelic marker in said sequence; optionally, wherein said 
polymorphism, chromosome 13q31-q33-related biallelic marker, region D-related biallelic 
marker, or sbgl-, g34665-, sbg2-, g35017- or g350 18 -related biallelic marker selected from the 
group consisting of A 1 to A489, and the complements thereof, or optionally the biallelic 
markers in linkage disequilibrium therewith; optionally, wherein said chromosome 13q3l-q33- 
related biallelic marker, region D-related biallelic marker, or sbgl-, g34665-, sbg2-, g35017- or 
g350 18 -related biallelic marker is selected from the group consisting of Al to A69, A71 to 
A74, A76 to A94, A96 to A106, A108 to Al 12, Al 14 to A177, A179 to A197, A199 to A222, 
A224 to A246, A250, A25 1, A253, A255, A259, A266, A268 to A232, A328 to 489; optionally, 
wherein said chromosome 13q31-q33-related biallelic marker, region D-related biallelic marker, 
sbgl-, g34665-, sbg2-, g35017- or g3501 8 -related biallelic marker is selected from the group 
consisting of A 1 to A69, A71 to A74, A76 to A94, A96 to A 106, A 108 to A 1 12, Al 14 to A 177, 
A179 to A197, A199 to A222, A224 to A242 and 361 to 489, and the complements thereof, or 
optionally the biallelic markers in linkage disequilibrium therewith; optionally, wherein said 
chromosome 13q31-q33-related biallelic marker, region D-related biallelic marker, or sbgl-, 
g34665-, sbg2-, g35017- or g35018 -related biallelic marker is selected from the group 
consisting of A 1 to A69, A71 to A74, A76 to A94, A96 to A 106, A 108 to Al 12, A 1 14 to A 177, 
A 179 to A 197, A 199 to A222, A224 to A242, A250 to A25 1 , A259 , A269 to A270, A278, 
A285 to A299, A303 to A307, A330, A334 to A335 and A346 to 357 and and 361 to 489, and 
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the complements thereof, or optionally the biallelic markers in linkage disequilibrium therewith; 
optionally, wherein said contiguous span is 1 8 to 35 nucleotides in length and said biallelic 
marker is within 4 nucleotides of the center of said polynucleotide; optionally, wherein said 
polynucleotide consists of said contiguous span and said contiguous span is 25 nucleotides in 
length and said biallelic marker is at the center of said polynucleotide; optionally, wherein the 
3' end of said contiguous span is present at the 3' end of said polynucleotide; and optionally, 
wherein the 3' end of said contiguous span is located at the 3' end of said polynucleotide and 
said biallelic marker is present at the 3' end of said polynucleotide. In a preferred embodiment, 
said probes comprise, consists of, or consists essentially of a sequence selected from the 
following sequences: P 1 to P360 and the complementary sequences thereto. 

In another embodiment the invention encompasses isolated, purified and recombinant 
polynucleotides comprising, consisting of, or consisting essentially of a contiguous span of 8 to 
50 nucleotides of any one of SEQ ID Nos 1 to 26, 36 to 40 and 54 to 229, or the complement 
thereof, wherein the 3' end of said contiguous span is located at the 3' end of said 
polynucleotide, and wherein the 3' end of said polynucleotide is located within 20 nucleotides 
upstream of a polymorphism of the invention, chromosome 13q3 l-q33-related biallelic marker 
region D-related biallelic marker, or sbgl-, g34665-, sbg2-, g3501 7- or g35018 -related biallelic 
marker in said sequence; optionally, wherein said chromosome 13q31-q33-related biallelic 
marker, region D-related biallelic marker, or sbgl-, g 34665-, sbg2-, g35017- or g3501 8 -related 
b.allelic marker is selected from the group consisting of Al to A489, and the complements 
thereof, or optionally the biallelic markers in linkage disequilibrium therewith; optionally, 
wherein said a chromosome 13q3 1 -q33-related biallelic marker, region D-related biallelic' 
marker, or sbgl-, g 34665-, sb g 2-, g3501 7- or g3501 8 -related biallelic marker is selected from 
the group consisting of Al to A69, A71 to A74, A76 to A94, A96 to A106, A108 to Al 12 
Al 14 to A177, A179 to A197, A199 to A222, A224 to A246, A250, A251, A253, A255 ^259 
A266, A268 to A232, A328 to A360, and and 361 to 489, and the complements thereof, or 
optionally the biallelic markers in linkage disequilibrium therewith; optionally, wherein said 
chromosome 13q3 l-q33-related biallelic marker, region D-related biallelic marker, or sbgl-, 
g34665-, sbg2-, g 35017- or g3501 8 -related biallelic marker is selected from the group 
consisting of Al to A69, A7 1 to A74, A76 to A94, A96 to A106, A108 to Al 12, Al 14 to A177 
A179 to A197, A199 to A222, A224 to A242 and 361 to 489; optionally, wherein said 
chromosome 1 3q3 1 -q33-related biallelic marker, region D-related biallelic marker, or sbgl -, 
g34665-, sbg2-, g35017- or g3501 8 -related biallelic marker is selected from the group 
consisting of optionally, wherein said chromosome 13q31-q33-related biallelic marker, region 
D-related biallelic marker, or sbgl-, g34665-, sbg2-, g35017- or g3 50 18 -related biallelic 
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marker is selected from the group consisting of A 1 to A69, A71 to A74, A76 to A94, A96 to 
A 1 06, A 1 08 to A 1 1 2, A 1 1 4 to A 1 77, A 1 79 to A 1 97, A 1 99 to A222, A224 to A242, A25 0 to 
A251, A259 , A269 to A270, A278, A285 to A299, A303 to A307, A330, A334 to A335, A346 
to 357 and 361 to 489, and the complements thereof, or optionally the biallelic markers in 
5 linkage disequilibrium therewith; optionally, wherein the 3' end of said polynucleotide is 

located 1 nucleotide upstream of said chromosome 13q31-q33-related biallelic marker, region 
D-related biallelic marker, or sbgl-, g34665-, sbg2-, g35017- or g350 18 -related biallelic 
marker; and optionally, wherein said polynucleotide comprises, consists of, or consists 
essentially of a sequence selected from the following sequences: Dl to D360 and El to E360. 

10 In a further embodiment, the invention encompasses isolated, purified, or recombinant 

polynucleotides comprising, consisting of, or consisting essentially of a sequence selected from 
the following sequences: Bl to B229 and CI to C229. 

In an additional embodiment, the invention encompasses polynucleotides for use in 
hybridization assays, sequencing assays, and enzyme-based mismatch detection assays for 

1 5 determining the identity of the nucleotide at a chromosome 1 3q3 1 -q33-related biallelic marker, 

region D-related biallelic marker, or sbgl-, g34665-, sbg2-, g35017- or g3501 8 -related biallelic 
marker in any of SEQ ID Nos. 1 to 26, 36 to 40 and 54 to 229 or the complement thereof, as 
well as polynucleotides for use in amplifying segments of nucleotides comprising a 
polymophism of the invention, a chromosome 1 3q3 1 -q33-related biallelic marker, region D- 

20 related biallelic marker, or sbgl-, g34665-, sbg2-, g35017- or g35018 -related biallelic marker 

in any of SEQ ID Nos 1 to 26, 36 to 40 and 54 to 229 or the complement thereof; optionally, 
wherein said chromosome 13q31-q33-related biallelic marker, region D-related biallelic marker, 
or sbgl-, g34665-, sbg2-, g35017- or g35018 -related biallelic marker is selected from the group 
consisting of Al to A489, and the complements thereof, or optionally the biallelic markers in 

25 linkage disequilibrium therewith; optionally, wherein said chromosome 1 3q3 1 -q33-related 

biallelic marker, region D-related biallelic marker, or sbgl-, g34665-, sbg2-, g35017- or g35018 
-related biallelic marker is selected from the group consisting of A 1 to A69, A71 to A74, A76 to 
A94, A96 to A106, A108 to Al 12, Al 14 to A177, A179 to A197, A199 to A222, A224 to 
A246, A250, A251, A253, A255, A259, A266, A268 to A232, A328 to A360 and 361 to 489, 

30 and the complements thereof, or optionally the biallelic markers in linkage disequilibrium 

therewith; optionally, wherein chromosome 13q31-q33-related biallelic marker, region D- 
related biallelic marker, or sbgl-, g34665-, sbg2-, g35017- or g3501 8 -related biallelic marker is 
selected from the group consisting of A 1 to A69, A7 1 to A74, A76 to A94, A96 to A 1 06, A 108 
to A 1 1 2, A 1 1 4 to A 1 77, A 1 79 to A 1 97, A 1 99 to A222, A224 to A242, A250 to A25 1 , A259 , 

35 A269 to A270, A278, A285 to A299, A303 to A307, A330, A334 to A335 and A346 to 357 and 
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361 to 489, and the complements thereof, or optionally the biallelic markers in linkage 
disequilibrium therewith; and optionally, wherein chromosome I3q3 l-q33-related biallelic 
marker, region D-related biallelic marker, orsbgl-, g34665-, sb g 2-, g35017-or g35018 -related 
biallelic marker is selected from the group consisting of Al to A69, A71 to A74, A76 to A94 
A96 to A106, A108 to Al 12, Al 14 to A177, A179 to A197, A199 to A222, A224 to A242 and 
361 to 489, and the complements thereof, or optionally the biallelic markers in linkage 
disequilibrium therewith. 

These arrays may generally be produced using mechanical synthesis methods or light 
directed synthesis methods, which incorporate a combination of photolithographic methods and 
sol.d phase oligonucleotide synthesis (Fodor et al. Science, 25 1 :767-777, 1991). The 
immobilization of arrays of oligonucleotides on solid supports has been rendered possible by 
the development of a technology generally identified as "Very Large Scale Immobilized 
Polymer Synthesis" (VLSIPStm) in which> probes are immobi|ized fa & ^ 

array on a solid surface of a chip. Examples of VLSIPS™ technologies are provided in US 
Patents 5,143,854 and 5,412,087 and in PCT Publications WO 90/15070, WO 92/10092 and 
WO 95/1 1 995, which describe methods for forming oligonucleotide arrays through techniques 
such as light-directed synthesis technique. In designing strategies aimed at providing arrays of 
nucleotides immobilized on solid supports, further presentation strategies were developed to 
order and display the oligonucleotide arrays on the chips in an attempt to maximize 
hybridization patterns and sequence information. Examples of such presentation strategies are 
d.sclosed in PCT Publications WO 94/12305, WO 94/1 1530, WO 97/29212 and WO 97/3 1256. 

Oligonucleotide arrays may comprise at least one of the sequences selected from the 
group consisting of SEQ ID Nos. 1 to 26, 36 to 40 and 54 to 229; and the sequences 
complementary thereto or a fragment thereof of at least 8, 10, 12, 15, 18,20,25,35,40 50 70 
80, 100, 250, 500 , 1000 or 2000 consecutive nucleotides, to the extent that fragments of these ' 
lengths is consistent with the lengths of the particular Sequence ID, for determining whether a 
sample contains one or more alleles of the biallelic markers of the present invention. 
Oligonucleotide arrays may also comprise at least one of the sequences selected from the group 
consisting of SEQ ID Nos. 1 to 26, 36 to 40 and 54 to 229; and the sequences complementary 
thereto or a fragment thereof of at least 8, 10, 12, 15, 18,20,25,35,40, 50, 70,80, 100,250, 
500 , 1 000 or 2000 consecutive nucleotides, to the extent that fragments of these lengths is ' 
consistent with the lengths of the particular Sequence ID, for amplifying one or more alleles of 
the biallelic markers of Table 6b or polymorphisms of Table 6c. In other embodiments, arrays 
may also comprise at least one of the sequences selected from the group consisting of SEQ ID 
Nos. I to 26, 36 to 40 and 54 to 229; and the sequences complementary thereto or a fragment 
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thereof of at least 8, 10, 12, 15, 18, 20, 25, 35, 40, 50, 70, 80, 100, 250, 500 , 1000 or 2000 
consecutive nucleotides, to the extent that fragments of these lengths is consistent with the 
lengths of the particular Sequence ID, for conducting microsequencing analyses to determine 
whether a sample contains one or more alleles of the biallelic markers of the invention. In still 
further embodiments, the oligonucleotide array may comprise at least one of the sequences 
selecting from the group consisting of SEQ ID Nos. 1 to 26, 36 to 40 and 54 to 229; and the 
sequences complementary thereto or a fragment thereof of at least 8, 10, 12, 15, 18, 20, 25, 35, 
40, 50, 70, 80, 100, 250, 500 , 1000 or 2000 nucleotides in length, to the extent that fragments 
of these lengths is consistent with the lengths of the particular Sequence ID, for determining 
whether a sample contains one or more alleles of the polymorphisms and biallelic markers of 
the present invention. 

A further object of the invention relates to an array of nucleic acid sequences 
comprising either at least one of the sequences selected from the group consisting of PI to 
P360, B 1 to B229, C 1 to C229, Dl to D360 El to E360 or the sequences complementary thereto 
or a fragment thereof of at least 8, 10, 12, 15, 18, 20, 25, 30, or 40 consecutive nucleotides 
thereof, or at least one sequence comprising at least 1, 2, 3, 4, 5, 10, 20 biallelic markers 
selected from the group consisting of A 1 to A489 or the complements thereof. The invention 
also pertains to an array of nucleic acid sequences comprising either at least 1, 2, 3, 4, 5, 10, 20 
of the sequences selected from the group consisting of PI to P360, Bl to B229, CI to C229, Dl 
to D360, El to E360 or the sequences complementary thereto or a fragment thereof of at least 8 
consecutive nucleotides thereof, or at least two sequences comprising a biallelic marker selected 
from the group consisting of A 1 to A360 or the complements thereto. 

The present invention also encompasses diagnostic kits comprising one or more 
polynucleotides of the invention, optionally with a portion or all of the necessary reagents and 
instructions for genotyping a test subject by determining the identity of a nucleotide at an 
biallelic marker of the invention. The polynucleotides of a kit may optionally be attached to a 
solid support, or be part of an array or addressable array of polynucleotides. The kit may 
provide for the determination of the identity of the nucleotide at a marker position by any 
method known in the art including, but not limited to, a sequencing assay method, a 
microsequencing assay method, a hybridization assay method, or enzyme-based mismatch 
detection assay. Optionally such a kit may include instructions for scoring the results of the 
determination with respect to the test subjects 1 predisposition to schizophrenia, or likely 
response to an agent acting on schizophrenia, or chances of suffering from side effects to an 
agent acting on schizophrenia. 

Finally, in any embodiments of the present invention, a biallelic marker may may 
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optionally comprise: 

(a) a biallelic marker selected from the group consisting of sbgl -related markers A85 to 
A219, or more preferably a biallelic marker selected from the group consisting of sbgl -related 

markersA85toA94,A96toA106,A108toA112 ) A114toA177,A179toAI97andA199to 
A2I9; 

(b) a biallelic marker selected from the group consisting of g34665-related markers 
A230toA236; 

(c) a biallelic marker selected from the group consisting of sbg2-reJated markers A79 to 

A99; 

(d) the g3501 7-related marker A4 1 ; 

(e) a biallelic marker selected from the group consisting of g3501 8-related markers Al 

toA39; 

(f) a biallelic marker selected from the group consisting of A239, A227, A198, A228 
A223, A107, A218, A270, A75, A62, A65 and A70; 

(g) a biallelic marker selected from the group consisting of A48, A60, A61 A62 A65 
A70, A75, A76, A80, A107, A108, A198, A218, A221, A223, A227, A228, A239, A285 
A286, A287, A288, A290, A292, A293, A295.A299 and A304; 

(h) a biallelic marker selected from the group consisting of A304, A307, A305, A298 
A292, A293, A291, A287, A286, A288, A289, A290, 99- A295 A299. A241, A239 A228 
A227, A223, A221, A218, A198, A178, 99-24649/186 A108, A107, A80, A75, A70, A65 'and 
A62; and/or 

(i) a biallelic marker selected from the group consisting of A304, A307, A305 A298 
A292, A293, A291, A287, A286, A288, A289, A290, A295 A299, A241, A239' A228 A227 
A223, A221, A218, A198, A178, A108, A107, A80, A76, A75, A70, A65, A62, A61, A60 ' 
A48. 

Optionally, in any of the embodiments described herein, a Region D- or chromosome 
13q31-q33-related biallelic marker may be selected from the group consisting of Al to A69 
A71 to A74, A76 to A94, A96 to A106, A108 to Al 12, Al 14 to A177, A179 to A197 A199 to 
A222, A224 to A242, A250 to A25 1, A259, A269 to A270, A278, A285 to A299 A303 to 
A307, A330, A334 to A335, A346 to 357 and 361 to 489. Optionally, in any of the 
embodiments described herein, a chromosome 1 3q3 1 -q33-related biallelic marker may be 
selected from the group consisting of Al to A69, A71 to A74, A76 to A94, A96 to A106, A108 
to Al 12, Al 14 to A177, A179 to A197, A199 to A222, A224 to A246, A250, A251 A253 
A255, A259, A266, A268 to A232 and A328 to A489. A set of said Region D-related biallelic 
markers or chromosome 13q3 l-q33-related biallelic markers may comprise at least I, ? 3 4 5 
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10, 20, 40, 50, 100 or 200 of said biallelic markers, respectively. 

Optionally, any of the compositions of methods described herein may specifically 
exclude at least I, 2, 3, 4, 5, 10, 20 biallelic markers, or all of the biallelic markers selected from 
the group consisting of: A70, A75, A95, A107, Al 13, A178, A198, A223, A247 to A249, A252, 
A254, A256 to A258, A260 to A265, A267, A324 to A328. 

Furthermore, in any of the embodiments of the present invention, a set of chromosome 
13q31-q33-related biallelic markers, Region D-related biallelic markers, or sbgl-, g34665-, 
sbg2-, g35017- or g3501 8 -related biallelic markers may comprise at least 1, 2, 3, 4, 5, 10, 20, 
40, 50, 100 or 200 of said biallelic markers. 

Methods For De Novo Identification Of Biallelic Markers 

Any of a variety of methods can be used to screen a genomic fragment for single 
nucleotide polymorphisms such as differential hybridization with oligonucleotide probes, 
detection of changes in the mobility measured by gel electrophoresis or direct sequencing of the 
amplified nucleic acid. A preferred method for identifying biallelic markers involves 
comparative sequencing of genomic DNA fragments from an appropriate number of unrelated 
individuals. 

In a first embodiment, DNA samples from unrelated individuals are pooled together, 
following which the genomic DNA of interest is amplified and sequenced. The nucleotide 
sequences thus obtained are then analyzed to identify significant polymorphisms. One of the 
major advantages of this method resides in the fact that the pooling of the DNA samples 
substantially reduces the number of DNA amplification reactions and sequencing reactions, 
which must be carried out. Moreover, this method is sufficiently sensitive so that a biallelic 
marker obtained thereby usually demonstrates a sufficient frequency of its less common allele 
to be useful in conducting association studies. Usually, the frequency of the least common 
allele of a biallelic marker identified by this method is at least 10%. 

In a second embodiment, the DNA samples are not pooled and are therefore amplified 
and sequenced individually. This method is usually preferred when biallelic markers need to be 
identified in order to perform association studies within candidate genes. Preferably, highly 
relevant gene regions such as promoter regions or exon regions may be screened for biallelic 
markers. A biallelic marker obtained using this method may show a lower degree of 
informativeness for conducting association studies, e.g. if the frequency of its less frequent 
allele may be less than about 10%. Such a biallelic marker will however be sufficiently 
informative to conduct association studies and it will further be appreciated that including less 
informative biallelic markers in the genetic analysis studies of the present invention, may allow 
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in some cases the direct identification of causal mutations, which may, depending on their 
penetrance, be rare mutations. 

The following is a description of the various parameters of a preferred method used by 
the inventors for the identification of the biailelic markers of the present invention. 
Genomic DNA samples 

The genomic DNA samples from which the biailelic markers of the present invention 
are generated are preferably obtained from unrelated individuals corresponding to a 
heterogeneous population of known ethnic background. The number of individuals from whom 
DNA samples are obtained can vary substantially, preferably from about 10 to about 1000 more 
preferably from about 50 to about 200 individuals. Usually, DNA samples are collected from at 
least about 100 individuals in order to have sufficient polymorphic diversity in a given 
population to identify as many markers as possible and to generate statistically significant 
results. 

As for the source of the genomic DNA to be subjected to analysis, any test sample can 
be foreseen without any particular limitation. These test samples include biological samples 
wh,ch can be tested by the methods of the present invention described herein, and include : ' 
human and animal body fluids such as whole blood, serum, plasma, cerebrospinal fluid, urine 
lymph flu.ds, and various external secretions of the respiratory, intestinal and genitourinary ' 
tracts, tears, saliva, milk, white blood cells, myelomas and the like; biological fluids such as cell 
culture supernatants; fixed tissue specimens including tumor and non-tumor tissue and lymph 
node tissues; bone marrow aspirates and fixed cell specimens. The preferred source of genomic 
DNA used in the present invention is from peripheral venous blood of each donor. Techniques 
to prepare genomic DNA from biological samples are well known to the skilled technician 
Details of a preferred embodiment are provided in Example 1 . The person skilled in the art can 
choose to amplify pooled or unpooled DNA samples. 
DNA Amplification 

The identification of biailelic markers in a sample of genomic DNA may be facilitated 
through the use of DNA amplification methods. DNA samples can be pooled or unpooled for 
the amplification step. DNA amplification techniques are well known to those skilled in the art 
Vanous methods to amplify DNA fragments carrying biailelic markers are further described 
heremafter herein. The PGR technology is the preferred amplification technique used to 
identify new biailelic markers. 

In a first embodiment, biailelic markers are identified using genomic sequence 
information generated by the inventors. Genomic DNA fragments, such as the inserts of the 
BAG clones described above, are sequenced and used to design primers for the amplification of 
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500 bp fragments. These 500 bp fragments are amplified from genomic DNA and are scanned 
for biallelic markers. Primers may be designed using the OSP software (Hillier L. and Green 
P., 1991). All primers may contain, upstream of the specific target bases, a common 
oligonucleotide tail that serves as a sequencing primer. Those skilled in the art are familiar with 
primer extensions, which can be used for these purposes. 

In another embodiment of the invention, genomic sequences of candidate genes are 
available in public databases allowing direct screening for biallelic markers. Preferred primers, 
useful for the amplification of genomic sequences encoding the candidate genes, focus on 
promoters, exons and splice sites of the genes. A biallelic marker present in these functional 
regions of the gene have a higher probability to be a causal mutation. 

Sequencing Of Amplified Genomic DNA And Identification Of Single Nucleotide 
Polymorphisms 

The amplification products generated as described above, are then sequenced using any 
method known and available to the skilled technician. Methods for sequencing DNA using 
either the dideoxy-mediated method (Sanger method) or the Maxam-Gilbert method are widely 
known to those of ordinary skill in the art. Such methods are for example disclosed in Maniatis 
et al. (Molecular Cloning, A Laboratory Manual, Cold Spring Harbor Press, Second Edition, 
1989). Alternative approaches include hybridization to high-density DNA probe arrays as 
described in Chee et al. (Science 274, 610, 1996). 

Preferably, the amplified DNA is subjected to automated dideoxy terminator 
sequencing reactions using a dye-primer cycle sequencing protocol. The products of the 
sequencing reactions are run on sequencing gels and the sequences are determined using gel 
image analysis. The polymorphism search is based on the presence of superimposed peaks in 
the electrophoresis pattern resulting from different bases occurring at the same position. 
Because each dideoxy terminator is labeled with a different fluorescent molecule, the two peaks 
corresponding to a biallelic site present distinct colors corresponding to two different 
nucleotides at the same position on the sequence. However, the presence of two peaks can be 
an artifact due to background noise. To exclude such an artifact, the two DNA strands are 
sequenced and a comparison between the peaks is carried out. In order to be registered as a 
polymorphic sequence, the polymorphism has to be detected on both strands. 

The above procedure permits those amplification products, which contain biallelic 
markers to be identified. The detection limit for the frequency of biallelic polymorphisms 
detected by sequencing pools of 100 individuals is approximately 0.1 for the minor allele, as 
verified by sequencing pools of known allelic frequencies. However, more than 90% of the 
biallelic polymorphisms detected by the pooling method have a frequency for the minor allele 
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higher than 0.25. Therefore, the biallelic markers selected by this method have a frequency of 
at least 0. 1 for the minor allele and less than 0.9 for the major allele. Preferably at least 0 2 for 
the minor allele and less than 0.8 for the major allele, more preferably at least 0.3 for the minor 
allele and less than 0.7 for the major allele, thus a heterozygosity rate higher than 0. 1 8, 
preferably higher than 0.32, more preferably higher than 0.42. 

In another embodiment, biallelic markers are detected by sequencing individual DNA 
samples, the frequency of the minor allele of such a biallelic marker may be less than 0.1. 

Validation of the biajjelic markers nf the present inv Pn ti» B 

The polymorphisms are evaluated for their usefulness as genetic markers by validating 
that both alleles are present in a population. Validation of the biallelic markers is accomplished 
by genotyping a group of individuals by a method of the invention and demonstrating that both 
alleles are present. Microsequencing is a preferred method of genotyping alleles. The 
validation by genotyping step may be performed on individual samples derived from each 
individual in the group or by genotyping a pooled sample derived from more than one 
individual. The group can be as small as one individual if that individual is heterozygous for 
the allele in question. Preferably the group contains at least three individuals, more preferably 
the group contains five or six individuals, so that a single validation test will be more likely to 
result in the validation of more of the biallelic markers that are being tested. It should be noted 
however, that when the validation test is performed on a small group it may result in a false 
negative result if as a result of sampling error none of the individuals tested carries one of the 
two alleles. Thus, the validation process is less useful in demonstrating that a particular initial 
result is an artifact, than it is at demonstrating that there is a bona fide biallelic marker at a 
particular position in a sequence. AH of the genotyping, haplotyping, association, and 
interaction study methods of the invention may optionally be performed solely with validated 
biallelic markers. 

Evaluation of the fre quency of the h iallelic marled of the n^t invention 

The validated biallelic markers are further evaluated for their usefulness as genetic 
markers by determining the frequency of the least common allele at the biallelic marker site. 
The determination of the least common allele is accomplished by genotyping a group of 
individuals by a method of the invention and demonstrating that both alleles are present. This 
determination of frequency by genotyping step may be performed on individual samples derived 
from each individual in the group or by genotyping a pooled sample derived from more than 
one individual. The group must be large enough to be representative of the population as a 
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whole. Preferably the group contains at least 20 individuals, more preferably the group contains 
at least 50 individuals, most preferably the group contains at least 100 individuals. Of course 
the larger the group the greater the accuracy of the frequency determination because of reduced 
sampling error. A biallelic marker wherein the frequency of the less common allele is 30% or 
more is termed a "high quality biallelic marker." All of the genotyping, haplotyping, 
association, and interaction study methods of the invention may optionally be performed solely 
with high quality biallelic markers. 

Another embodiment of the invention comprises methods of estimating the frequency 
of an allele in a population comprising genotyping individuals from said population for a 
13q3 l-q33-related biallelic marker and determining the proportional representation of said 
biallelic marker in said population. In addition, the methods of estimating the frequency of an 
allele in a population encompass methods with any further limitation described in this 
disclosure, or those following, specified alone or in any combination: Optionally, said 13q31- 
q33-related biallelic marker may be in a sequence selected individually or in any combination 
from the group consisting of SEQ Nos 1 to 26, 36 to 40 and 54 to 229; and the complements 
thereof; optionally, said 13q3 l-q33-related biallelic marker may be selected from the biallelic 
markers described in Table 6b or 6c; optionally, determining the frequency of a biallelic marker 
allele in a population may be accomplished by determining the identity of the nucleotides for 
both copies of said biallelic marker present in the genome of each individual in said population 
and calculating the proportional representation of said nucleotide at said 13q31-q33-related 
biallelic marker for the population; optionally, determining the frequency of a biallelic marker 
allele in a population may be accomplished by performing a genotyping method on a pooled 
biological sample derived from a representative number of individuals, or each individual, in 
said population, and calculating the proportional amount of said nucleotide compared with the 
total. 

Methods Of Genotyping An Individual For Biallelic Markers 

Methods are provided to genotype a biological sample for one or more biallelic markers 
of the present invention, all of which may be performed in vitro. Such methods of genotyping 
comprise determining the identity of a nucleotide at an biallelic marker of the invention by any 
method known in the art. These methods find use in genotyping case-control populations in 
association studies as well as individuals in the context of detection of alleles of biallelic 
markers which, are known to be associated with a given trait, in which case both copies of the 
biallelic marker present in individual's genome are determined so that an individual may be 
classified as homozygous or heterozygous for a particular allele. 

These genotyping methods can be performed nucleic acid samples derived from a singh 



WO 00/585 10 

PCT/IBOO/00435 

104 

individual or pooled DNA samples. 

Genotyping can be performed using similar methods as those described above for the 
identification of the biallelic markers, or using other genotyping methods such as those further 
described below. In preferred embodiments, the comparison of sequences of amplified genomic 
fragments from different individuals is used to identify new biallelic markers whereas 
microsequencing is used for genotyping known biallelic markers in diagnostic and association 
study applications. 

Another embodiment of the invention encompasses methods of genotyping a biological 
sample comprising determining the identity of a nucleotide at a 13q3 1-q33-related biallelic 
marker. In addition, the genotyping methods of the invention encompass methods with any 
further limitation described in this disclosure, or those following, specified alone or in any 
combination: Optionally, said 13q31-q33-related biallelic marker may be in a sequence 
selected individually or in any combination from the group consisting of SEQ ID Nos 1 to 26 
36 to 40 and 54 to 229, and the complements thereof; optionally, said 13q3 l-q33-related 
biallelic marker may be selected individually or in any combination from the biallelic markers 
described in Table 6b and 6c; optionally, said method further comprises determining the identity 
of a second nucleotide at said biallelic marker, wherein said first nucleotide and second 
nucleotide are not base paired (by Watson & Grick base pairing) to one another; optionally said 
b,olog,cal sample is derived from a single individual or subject; optionally, said method is 
performed in vitro; optionally, said biallelic marker is determined for both copies of said 
biallelic marker present in said individual's genome; optionally, said biological sample is 
derived from multiple subjects or individuals; optionally, said method further comprises 
amplifying a portion of said sequence comprising the biallelic marker prior to said determining 
step; optionally, wherein said amplifying is performed by PCR, LCR, or replication of a 
recombinant vector comprising an origin of replication and said portion in a host cell; 
optionally, wherein said determining is performed by a hybridization assay, sequencing assay, 
microsequencing assay, or an enzyme-based mismatch detection assay. 
Source of DNA for penorypin? 

Any source of nucleic acids, in purified or non-purified form, can be utilized as the 
starting nucleic acid, provided it contains or is suspected of containing the specific nucleic acid 
sequence desired. DNA or RNA may be extracted from cells, tissues, body fluids and the like 
as described herein. While nucleic acids for use in the genotyping methods of the invention . 
be derived from any mammalian source, the test subjects and individuals from which nucleic 
acid samples are taken are generally understood to be human. 

Ajayjlfe^ Biallelic Mark™, 



> can 
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Methods and polynucleotides are provided to amplify a segment of nucleotides 
comprising one or more biallelic marker of the present invention. It will be appreciated that 
amplification of DNA fragments comprising biallelic markers may be used in various methods 
and for various purposes and is not restricted to genotyping. Nevertheless, many genotyping 
5 methods, although not all, require the previous amplification of the DNA region carrying the 

biallelic marker of interest. Such methods specifically increase the concentration or total 
number of sequences that span the biallelic marker or include that site and sequences located 
either distal or proximal to it. Diagnostic assays may also rely on amplification of DNA 
segments carrying a biallelic marker of the present invention. 

10 Amplification of DNA may be achieved by any method known in the art. The 

established PCR (polymerase chain reaction) method or by developments thereof or 
alternatives. Amplification methods which can be utilized herein include but are not limited to 
Ligase Chain Reaction (LCR) as described in EP A 320 308 and EP A 439 1 82, Gap LCR 
(Wolcott, MJ.), the so-called "NASBA" or "3SR" technique described in Guatelli J.C. et al. 

15 (1990) and in Compton J. (1991), Q-beta amplification as described in EP A 4544 610, strand 

displacement amplification as described in Walker et al. (1996) and EP A 684 315 and, target 
mediated amplification as described in PCT Publication WO 9322461. 

LCR and Gap LCR are exponential amplification techniques, both depend on DNA 
ligase to join adjacent primers annealed to a DNA molecule. In Ligase Chain Reaction (LCR), 

20 probe pairs are used which include two primary (first and second) and two secondary (third and 

fourth) probes, all of which are employed in molar excess to target. The first probe hybridizes 
to a first segment of the target strand and the second probe hybridizes to a second segment of 
the target strand, the first and second segments being contiguous so that the primary probes abut 
one another in 5' phosphate-3'hydroxyl relationship, and so that a ligase can covalently fuse or 

25 ligate the two probes into a fused product. In addition, a third (secondary) probe can hybridize 

to a portion of the first probe and a fourth (secondary) probe can hybridize to a portion of the 
second probe in a similar abutting fashion. Of course, if the target is initially double stranded, 
the secondary probes also will hybridize to the target complement in the first instance. Once the 
ligated strand of primary probes is separated from the target strand, it will hybridize with the 

30 third and fourth probes which can be ligated to form a complementary, secondary ligated 

product. It is important to realize that the ligated products are functionally equivalent to either 
the target or its complement. By repeated cycles of hybridization and ligation, amplification of 
the target sequence is achieved. A method for multiplex LCR has also been described (WO 
9320227). Gap LCR (GLCR) is a version of LCR where the probes are not adjacent but are 

35 separated by 2 to 3 bases. 
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For amplification of mRNAs, it is within the scope of the present invention to reverse 
transcribe mRNA into cDNA followed by polymerase chain reaction (RT-PCR); or, to use a 
single enzyme for both steps as described in U.S. Patent No. 5,322,770 or, to use Asymmetric 
Gap LCR (RT-AGLCR) as described by Marshall R.L. et al. (1 994). AGLCR is a modification 
of GLCR that allows the amplification of RNA. 

Some of these amplification methods are particularly suited for the detection of single 
nucleotide polymorphisms and allow the simultaneous amplification of a target sequence and 
the identification of the polymorphic nucleotide as it is further described herein. 

The PCR technology is the preferred amplification technique used in the present 
invention. A variety of PCR techniques are familiar to those skilled in the art. For a review of 
PCR technology, see Molecular Cloning to Genetic Engineering White, B.A. Ed. (1997) and 
the publication entitled "PCR Methods and Applications" (1991, Cold Spring Harbor 
Laboratory Press). In each of these PCR procedures, PCR primers on either side of the nucleic 
acid sequences to be amplified are added to a suitably prepared nucleic acid sample along with 
dNTPs and a thermostable polymerase such as Taq polymerase, Pfu polymerase, or Vent 
polymerase. The nucleic acid in the sample is denatured and the PCR primers are specifically 
hybnd.zed to complementary nucleic acid sequences in the sample. The hybridized primers are 
extended. Thereafter, another cycle of denaturation, hybridization, and extension is initiated 
The cycles are repeated multiple times to produce an amplified fragment containing the nucleic 
acd sequence between the primer sites. PCR has further been described in several patents 
including US Patents 4,683, 1 95, 4,683,202 and 4,965, 1 88. 

Primers can be prepared by any suitable method. As for example, direct chemical 
synthesis by a method such as the phosphodiester method of Narang S.A. etal (1979) the 
phosphodiester method of Brown E.L. et al. (1979), the diethylphosphoramidite method of 
Beaucage et al. (1981) and the solid support method described in EP 0 707 592. 

In some embodiments the present invention provides primers for amplifying a DNA 
fragment containing one or more biallelic markers of the present invention. It will be 
appreciated that the primers listed are merely exemplary and that any other set of primers which 
produce amplification products containing one or more biallelic markers of the present 
invention. 

The spacing of the primers determines the length of the segment to be amplified In the 
context of the present invention amplified segments carrying biallelic markers can range in size 
from at least about 25 bp to 35 kbp. Amplification fragments from 25-3000 bp are typical 
fragments from 50-1000 bp are preferred and fragments from 100-600 bp are highly preferred 
It will be appreciated that amplification primers for the biallelic markers may be any sequence 
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which allow the specific amplification of any DNA fragment carrying the markers. 
Amplification primers may be labeled or immobilized on a solid support as described in the 
section titled "Oligonucleotide Probes and Primers". 

Methods of Genotyping DNA samples for Biallelic Markers 

Any method known in the art can be used to identify the nucleotide present at a biallelic 
marker site. Since the biallelic marker allele to be detected has been identified and specified in 
the present invention, detection will prove simple for one of ordinary skill in the art by 
employing any of a number of techniques. Many genotyping methods require the previous 
amplification of the DNA region carrying the biallelic marker of interest. While the 
amplification of target or signal is often preferred at present, ultrasensitive detection methods 
which do not require amplification are also encompassed by the present genotyping methods. 
Methods well-known to those skilled in the art that can be used to detect biallelic 
polymorphisms include methods such as, conventional dot blot analyzes, single strand 
conformational polymorphism analysis (SSCP) described by Orita et al. (1989), denaturing 
gradient gel electrophoresis (DGGE), heteroduplex analysis, mismatch cleavage detection, and 
other conventional techniques as described in Sheffield, V.C. et al. (1991), White et al. (1992), 
Grompe, M. et al. (1989) and Grompe, M. (1993). Another method for determining the 
identity of the nucleotide present at a particular polymorphic site employs a specialized 
exonuclease-resistant nucleotide derivative as described in US patent 4,656,127. 

Preferred methods involve directly determining the identity of the nucleotide present at 
a biallelic marker site by sequencing assay, enzyme-based mismatch detection assay, or 
hybridization assay. The following is a description of some preferred methods. A highly 
preferred method is the microsequencing technique. The term "sequencing assay" is used 
herein to refer to polymerase extension of duplex primer/template complexes and includes both 
traditional sequencing and microsequencing. 

1) Sequencing assays 

The nucleotide present at a polymorphic site can be determined by sequencing methods. 
In a preferred embodiment, DNA samples are subjected to PCR amplification before 
sequencing as described above. DNA sequencing methods are described in herein. Preferably, 
the amplified DNA is subjected to automated dideoxy terminator sequencing reactions using a 
dye-primer cycle sequencing protocol. Sequence analysis allows the identification of the base 
present at the biallelic marker site. 

2) Microsequencing assays 

In microsequencing methods, a nucleotide at the polymorphic site that is unique to one 
of the alleles in a target DNA is detected by a single nucleotide primer extension reaction. This 
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method involves appropriate microsequencing primers which, hybridize just upstream of a 
polymorphic base of interest in the target nucleic acid. A polymerase is used to specifically 
extend the 3' end of the primer with one single ddNTP (chain terminator) complementary to the 
selected nucleotide at the polymorphic site. Next the identity of the incorporated nucleotide is 
determined in any suitable way. 

Typically, microsequencing reactions are carried out using fluorescent ddNTPs and the 
extended microsequencing primers are analyzed by electrophoresis on ABI 377 sequencing 
machines to determine the identity of the incorporated nucleotide as described in EP 412 883 
Alternatively capillary electrophoresis can be used in order to process a higher number of assays 
s«multaneously. An example of a typical microsequencing procedure that can be used in the 
context of the present invention is provided in example 4. 

Different approaches can be used to detect the nucleotide added to the microsequencing 
primer. A homogeneous phase detection method based on fluorescence resonance energy 
transfer has been described by Chen and Kwok (1997) and Chen et al. (1997). In this method 
amplified genomic DNA fragments containing polymorphic sites are incubated with a 5'- 
fluorescein-labeled primer in the presence of allelic dye-labeled dideoxyribonucleoside 
triphosphates and a modified Taq polymerase. The dye-labeled primer is extended one base by 
the dye-terminator specific for the allele present on the template. At the end of the genotyping 
reaction, the fluorescence intensities of the two dyes in the reaction mixture are analyzed 
directly without separation or purification. All these steps can be performed in the same tube 
and the fluorescence changes can be monitored in real time. Alternatively, the extended primer 
may be analyzed by MALDI-TOF Mass Spectrometry. The base at the polymorphic site is 
.dentified by the mass added onto the microsequencing primer (see HafFL.A. and Smirnov I P 
1997). * *' 

Microsequencing may be achieved by the established microsequencing method or by 
developments or derivatives thereof. Alternative methods include several solid-phase 
microsequencing techniques. The basic microsequencing protocol is the same as described 
previously, except that the method is conducted as a heterogenous phase assay, in which the 
pnmer or the target molecule is immobilized or captured onto a solid support. To simplify the 
primer separation and the terminal nucleotide addition analysis, oligonucleotides are attached to 
sol.d supports or are modified in such ways that permit affinity separation as well as polymerase 
extension. The 5' ends and internal nucleotides of synthetic oligonucleotides can be modified in 
a number of different ways to permit different affinity separation approaches, e.g., biotinylation 
If a single affinity group is used on the oligonucleotides, the oligonucleotides can be separated 
from the incorporated terminator regent. This eliminates the need of physical or size separation 
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More than one oligonucleotide can be separated from the terminator reagent and analyzed 
simultaneously if more than one affinity group is used. This permits the analysis of several 
nucleic acid species or more nucleic acid sequence information per extension reaction. The 
affinity group need not be on the priming oligonucleotide but could alternatively be present on 
the template. For example, immobilization can be carried out via an interaction between 
biotinylated DNA and streptavidin-coated microtitration wells or avidin-coated polystyrene 
particles. In the same manner oligonucleotides or templates may be attached to a solid support 
in a high-density format. In such solid phase microsequencing reactions, incorporated ddNTPs 
can be radiolabeled (Syvanen, 1994) or linked to fluorescein (Livak and Hainer, 1994). The 
detection of radiolabeled ddNTPs can be achieved through scintillation-based techniques. The 
detection of fluorescein-linked ddNTPs can be based on the binding of antifluorescein antibody 
conjugated with alkaline phosphatase, followed by incubation with a chromogenic substrate 
(such as /?-nitrophenyl phosphate). Other possible reporter-detection pairs include: ddNTP 
linked to dinitrophenyl (DNP) and anti-DNP alkaline phosphatase conjugate (Harju et al., 1993) 
or biotinylated ddNTP and horseradish peroxidase-conjugated streptavidin with o- 
phenylenediamine as a substrate (WO 92/15712). As yet another alternative solid-phase 
microsequencing procedure, Nyren et al. (1993) described a method relying on the detection of 
DNA polymerase activity by an enzymatic luminometric inorganic pyrophosphate detection 
assay (ELI DA). 

Pastinen et al. (1997), describe a method for multiplex detection of single nucleotide 
polymorphism in which the solid phase minisequencing principle is applied to an 
oligonucleotide array format. High-density arrays of DNA probes attached to a solid support 
(DNA chips) are further described in herein. 

In one aspect the present invention provides polynucleotides and methods to genotype 
one or more biallelic markers of the present invention by performing a microsequencing assay. 
Preferred microsequencing primers include those being featured Table 6d. It will be 
appreciated that the microsequencing primers listed in Table 6d are merely exemplary and that, 
any primer having a 3' end immediately adjacent to a polymorphic nucleotide may be used. 
Similarly, it will be appreciated that microsequencing analysis may be performed for any 
biallelic marker or any combination of biallelic markers of the present invention. One aspect of 
the present invention is a solid support which includes one or more microsequencing primers 
listed in Table 6d, or fragments comprising at least 8, at least 12, at least 1 5, or at least 20 
consecutive nucleotides thereof and having a 3' terminus immediately upstream of the 
corresponding biallelic marker, for determining the identity of a nucleotide at biallelic marker 
site. 
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3) Mismatch detection assays based on polymerases and ligases 
In one aspect the present invention provides polynucleotides and methods to determine 
the allele of one or more biallelic markers of the present invention in a biological sample by 
mismatch detection assays based on polymerases and/or ligases. These assays are based 'on the 
specificity of polymerases and ligases. Polymerization reactions places particularly stringent 
requirements on correct base pairing of the 3' end of the amplification primer and the joining of 
two oligonucleotides hybridized to a target DNA sequence is quite sensitive to mismatches 
close to the ligation site, especially at the 3' end. The terms "enzyme based mismatch detection 
assay" are used herein to refer to any method of determining the allele of a biallelic marker 
based on the specificity of ligases and polymerases. Preferred methods are described below. 
Methods, primers and various parameters to amplify DNA fragments comprising biallelic 
markers of the present invention are further described herein. 
Allele specific amplification 

Discrimination between the two alleles of a biallelic marker can also be achieved by 
allele specific amplification, a selective strategy, whereby one of the alleles is amplified without 
amplification of the other allele. This is accomplished by placing a polymorphic base at the 3' 
end of one of the amplification primers. Because the extension forms from the 3'end of the 
primer, a mismatch at or near this position has an inhibitory effect on amplification. Therefore 
under appropriate amplification conditions, these primers only direct amplification on their 
complementary allele. Designing the appropriate allele-specific primer and the corresponding 
assay conditions are well with the ordinary skill in the art. 

Ligation/amplification based methods 

The "Oligonucleotide Ligation Assay" (OLA) uses two oligonucleotides which are 
designed to be capable of hybridizing to abutting sequences of a single strand of a target 
molecules. One of the oligonucleotides is biotinylated, and the other is detectably labeled. If 
the precise complementary sequence is found in a target molecule, the oligonucleotides will 
hybridize such that their termini abut, and create a ligation substrate that can be captured and 
detected. OLA is capable of detecting biallelic markers and may be advantageously combined 
with PGR as described by Nickerson D.A. et al. (1990). In this method, PCR is used to achieve 
the exponential amplification of target DNA, which is then detected using OLA. 

Other methods which are particularly suited for the detection of biallelic markers 
include LCR (ligase chain reaction), Gap LCR (GLCR) which are described herein. As 
mentioned above LCR uses two pairs of probes to exponentially amplify a specific target The 
sequences of each pair of oligonucleotides, is selected to permit the pair to hybridize to abutting 
sequences of the same strand of the target. Such hybridization forms a substrate for a template- 
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dependant ligase. In accordance with the present invention, LCR can be performed with 
oligonucleotides having the proximal and distal sequences of the same strand of a biallelic 
marker site. In one embodiment, either oligonucleotide will be designed to include the biallelic 
marker site. In such an embodiment, the reaction conditions are selected such that the 
oligonucleotides can be ligated together only if the target molecule either contains or lacks the 
specific nucleotide(s) that is complementary to the biallelic marker on the oligonucleotide. In 
an alternative embodiment, the oligonucleotides will not include the biallelic marker, such that 
when they hybridize to the target molecule, a "gap" is created as described in WO 90/01069. 
his gap is then "filled" with complementary dNTPs (as mediated by DNA polymerase), or by an 
additional pair of oligonucleotides. Thus at the end of each cycle, each single strand has a 
complement capable of serving as a target during the next cycle and exponential allele-specific 
amplification of the desired sequence is obtained. 

Ligase/Polymerase-mediated Genetic Bit Analysis™ is another method for determining 
the identity of a nucleotide at a preselected site in a nucleic acid molecule (WO 95/21271). This 
method involves the incorporation of a nucleoside triphosphate that is complementary to the 
nucleotide present at the preselected site onto the terminus of a primer molecule, and their 
subsequent ligation to a second oligonucleotide. The reaction is monitored by detecting a 
specific label attached to the reaction's solid phase or by detection in solution. 

4) Hybridization assay methods 

A preferred method of determining the identity of the nucleotide present at a biallelic 
marker site involves nucleic acid hybridization. The hybridization probes, which can be 
conveniently used in such reactions, preferably include the probes defined herein. Any 
hybridization assay may be used including Southern hybridization, Northern hybridization, dot 
blot hybridization and solid-phase hybridization (see Sambrook et aL, Molecular Cloning - A 
Laboratory Manual, Second Edition, Cold Spring Harbor Press, N.Y., 1989). 

Hybridization refers to the formation of a duplex structure by two single stranded 
nucleic acids due to complementary base pairing. Hybridization can occur between exactly 
complementary nucleic acid strands or between nucleic acid strands that contain minor regions 
of mismatch. Specific probes can be designed that hybridize to one form of a biallelic marker 
and not to the other and therefore are able to discriminate between different allelic forms. 
Allele-specific probes are often used in pairs, one member of a pair showing perfect match to a 
target sequence containing the original allele and the other showing a perfect match to the target 
sequence containing the alternative allele. Hybridization conditions should be sufficiently 
stringent that there is a significant difference in hybridization intensity between alleles, and 
preferably an essentially binary response, whereby a probe hybridizes to only one of the alleles. 
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Stringent, sequence specific hybridization conditions, under which a probe will hybridize only 
to the exactly complementary target sequence are well known in the art (Sambrook et al., 
Molecular Cloning - A Laboratory Manual, Second Edition, Cold Spring Harbor Press, N.Y., 
1989). Stringent conditions are sequence dependent and will be different in different 
circumstances. Generally, stringent conditions are selected to be about 5°C lower than the 
thermal melting point (Tm) for the specific sequence at a defined ionic strength and pH. By 
way of example and not limitation, procedures using conditions of high stringency are as 
follows: Prehybridization of filters containing DNA is carried out for 8 h to overnight at 65°C in 
buffer composed of 6X SSC, 50 mM Tris-HCl (pH 7.5), 1 mM EDTA, 0.02% PVP 0 02% 
Ficoll, 0.02% BSA, and 500 ug/ml denatured salmon sperm DNA. Filters are hybridized for 
48 h at 65°C, the preferred hybridization temperature, in prehybridization mixture containing 
100 ug/ml denatured salmon sperm DNA and 5-20 X 10 6 cpm of 32 P-labeled probe. 
Alternatively, the hybridization step can be performed at 65°C in the presence of SSC buffer 1 
x SSC corresponding to 0. 1 5M Nad and 0.05 M Na citrate. Subsequently, filter washes can be 
done at 37°C for 1 h in a solution containing 2X SSC, 0.01% PVP, 0.01% Ficoll, and 0 01% 
BSA, followed by a wash in 0.1 X SSC at 50°C for 45 min. Alternatively, filter washes can be 
performed in a solution containing 2 x SSC and 0. 1% SDS, or 0.5 x SSC and 0.1% SDS or 0 1 
x SSC and 0.1% SDS at 68°C for 15 minute intervals. Following the wash steps, the hybridized 
probes are detectable by autoradiography. By way of example and not limitation, procedures 
using conditions of intermediate stringency are as follows: Filters containing DNA are 
prehybridized, and then hybridized at a temperature of 60°C in the presence of a 5 x SSC buffer 
and labeled probe. Subsequently, filters washes are performed in a solution containing 2x SSC 
at 50°C and the hybridized probes are detectable by autoradiography. Other conditions of high 
and intermediate stringency which may be used are well known in the art and as cited in 
Sambrook et al. (Molecular Cloning - A Laboratory Manual, Second Edition, Cold Spring 
Harbor Press, N.Y., 1989) and Ausubel et al. (Current Protocols in Molecular Biology, Green 
Publishing Associates and Wiley Interscience, N.Y., 1989). 

Although such hybridizations can be performed in solution, it is preferred to employ a 
solid-phase hybridization assay. The target DNA comprising a biallelic marker of the present 
invention may be amplified prior to the hybridization reaction. The presence of a specific allele 
m the sample is determined by detecting the presence or the absence of stable hybrid duplexes 
formed between the probe and the target DNA. The detection of hybrid duplexes can be carried 
out by a number of methods. Various detection assay formats are well known which utilize 
detectable labels bound to either the target or the probe to enable detection of the hybrid 
duplexes. Typically, hybridization duplexes are separated from unhybridized nucleic acids and 
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the labels bound to the duplexes are then detected. Those skilled in the art will recognize that 
wash steps may be employed to wash away excess target DNA or probe. Standard 
heterogeneous assay formats are suitable for detecting the hybrids using the labels present on 
the primers and probes. 

Two recently developed assays allow hybridization-based allele discrimination with no 
need for separations or washes (see Landegren U. et al.,1998). The TaqMan assay takes 
advantage of the 5' nuclease activity of Taq DNA polymerase to digest a DNA probe annealed 
specifically to the accumulating amplification product. TaqMan probes are labeled with a 
donor-acceptor dye pair that interacts via fluorescence energy transfer. Cleavage of the 
TaqMan probe by the advancing polymerase during amplification dissociates the donor dye 
from the quenching acceptor dye, greatly increasing the donor fluorescence. All reagents 
necessary to detect two allelic variants can be assembled at the beginning of the reaction and the 
results are monitored in real time (see Livak et al, 1995). In an alternative homogeneous 
hybridization-based procedure, molecular beacons are used for allele discriminations. 
Molecular beacons are hairpin-shaped oligonucleotide probes that report the presence of 
specific nucleic acids in homogeneous solutions. When they bind to their targets they undergo a 
conformational reorganization that restores the fluorescence of an internally quenched 
fluorophore (Tyagi et al., 1998). 

By assaying the hybridization to an allele specific probe, one can detect the presence or 
absence of a biallelic marker allele in a given sample. 

High-Throughput parallel hybridizations in array format are specifically encompassed 
within "hybridization assays" and are described below. 

Hybridization to addressable arrays of oligonucleotides 

Hybridization assays based on oligonucleotide arrays rely on the differences in 
hybridization stability of short oligonucleotides to perfectly matched and mismatched target 
sequence variants. Efficient access to polymorphism information is obtained through a basic 
structure comprising high-density arrays of oligonucleotide probes attached to a solid support 
(the chip) at selected positions. Each DNA chip can contain thousands to millions of individual 
synthetic DNA probes arranged in a grid-like pattern and miniaturized to the size of a dime. 

The chip technology has already been applied with success in numerous cases. For 
example, the screening of mutations has been undertaken in the BRCA1 gene, in S. cerevisiae 
mutant strains, and in the protease gene of HIV-1 virus (Hacia et al., 1996; Shoemaker et al., 
1996 ; Kozal et al., 1996). Chips of various formats for use in detecting biallelic 
polymorphisms can be produced on a customized basis by Affymetrix (GeneChip™), Hyseq 
(HyChip and HyGnostics), and Protogene Laboratories. 



WO 00/58510 

PCT/IB00/00435 

114 

In general, these methods employ arrays of oligonucleotide probes that are 
complementary to target nucleic acid sequence segments from an individual which, target 
sequences include a polymorphic marker. EP785280, describes a tiling strategy for the 
detection of single nucleotide polymorphisms. Briefly, arrays may generally be "tiled" for a 
large number of specific polymorphisms. By "tiling" is generally meant the synthesis of a 
defined set of oligonucleotide probes which is made up of a sequence complementary to the 
target sequence of interest, as well as preselected variations of that sequence, e.g., substitution 
of one or more given positions with one or more members of the basis set of monomers, i.e. 
nucleotides. Tiling strategies are further described in PCT application No. WO 95/1 1995. In a 
particular aspect, arrays are tiled for a number of specific, identified biallelic marker sequences. 
In particular the array is tiled to include a number of detection blocks, each detection block 
being specific for a specific biallelic marker or a set of biallelic markers. For example, a 
detection block may be tiled to include a number of probes, which span the sequence segment 
that includes a specific polymorphism. To ensure probes that are complementary to each allele, 
the probes are synthesized in pairs differing at the biallelic marker. In addition to the probes 
differing at the polymorphic base, monosubstituted probes are also generally tiled within the 
detection block. These monosubstituted probes have bases at and up to a certain number of 
bases in either direction from the polymorphism, substituted with the remaining nucleotides 
(selected from A, T, G, C and U). Typically the probes in a tiled detection block will include 
substitutions of the sequence positions up to and including those that are 5 bases away from the 
biallelic marker. The monosubstituted probes provide internal controls for the tiled array, to 
distinguish actual hybridization from artefactual cross-hybridization. Upon completion of 
hybridization with the target sequence and washing of the array, the array is scanned to 
determine the position on the array to which the target sequence hybridizes. The hybridization 
data from the scanned array is then analyzed to identify which allele or alleles of the biallelic 
marker are present in the sample. Hybridization and scanning may be carried out as described 
in PCT application No. WO 92/10092 and WO 95/1 1995 and US patent No. 5,424,186. 

Thus, in some embodiments, the chips may comprise an array of nucleic acid sequences 
of fragments of about 15 nucleotides in length. In further embodiments, the chip may comprise 
an array including at least one of the sequences selected from the group consisting of SEQ ID 
Nos. 1 to 26, 36 to 40 and 54 to 229 and the sequences complementary thereto, or a fragment 
thereof at least about 8 consecutive nucleotides, preferably 1 0, 1 5, 20, more preferably 25, 30, 
40, 47, or 50 consecutive nucleotides. In some embodiments, the chip may comprise an array of 
at least 2, 3, 4, 5, 6, 7, 8 or more of these polynucleotides of the invention. Solid supports and 
polynucleotides of the present invention attached to solid supports are further described in the 
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section titled "Oligonucleotide probes and Primers". 
5) Integrated Systems 

Another technique, which may be used to analyze polymorphisms, includes 
multicomponent integrated systems, which miniaturize and compartmentalize processes such as 
PCR and capillary electrophoresis reactions in a single functional device. An example of such 
technique is disclosed in US patent 5,589,136, which describes the integration of PCR 
amplification and capillary electrophoresis in chips. 

Integrated systems can be envisaged mainly when microfluidic systems are used. These 
systems comprise a pattern of microchannels designed onto a glass, silicon, quartz, or plastic 
wafer included on a microchip. The movements of the samples are controlled by electric, 
electroosmotic or hydrostatic forces applied across different areas of the microchip. For 
genotyping biallelic markers, the microfluidic system may integrate nucleic acid amplification, 
microsequencing, capillary electrophoresis and a detection method such as laser-induced 
fluorescence detection. 

Methods Of Genetic Analysis Using The Biallelic Markers Of The Present 
Invention 

Different methods are available for the genetic analysis of complex traits (see Lander . 
and Schork, 1994). The search for disease-susceptibility genes is conducted using two main 
methods: the linkage approach in which evidence is sought for cosegregation between a locus 
and a putative trait locus using family studies, and the association approach in which evidence is 
sought for a statistically significant association between an allele and a trait or a trait causing 
allele (Khoury J. et al, 1993). In general, the biallelic markers of the present invention find use 
in any method known in the art to demonstrate a statistically significant correlation between a 
genotype and a phenotype. The biallelic markers may be used in parametric and non-parametric 
linkage analysis methods. Preferably, the biallelic markers of the present invention are used to 
identify genes associated with detectable traits using association studies, an approach which 
does not require the use of affected families and which permits the identification of genes 
associated with complex and sporadic traits. 

The genetic analysis using the biallelic markers of the present invention may be 
conducted on any scale. The whole set of biallelic markers of the present invention or any 
subset of biallelic markers of the present invention may be used. In some embodiments a subset 
of biallelic markers corresponding to one or several candidate genes of the present invention 
may be used. Alternatively, a subset of biallelic markers of the present invention localised on a 
specific chromosome segment may be used. Further, any set of genetic markers including a 
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biallelic marker of the present invention may be used. As mentioned above, it should be 
that the biallelic markers of the present invention may be included in any complete or partial 
genetic map of the human genome. These different uses are specifically contemplated in the 



noted 



present invention and claims. 



Linkage analysis 

Linkage analysis is based upon establishing a correlation between the transmission of 
genetic markers and that of a specific trait throughout generations within a family. Thus the 
aun of linkage analysis is to detect marker loci that show cosegregation with a trait of interest in 
pedigrees. 

Parametric methods 

When data are available from successive generations there is the opportunity to study 
the degree of linkage between pairs of loci. Estimates of the recombination fraction enab.e loci 
to be ordered and placed onto a genetic map. With loci that are genetic markers, a genetic map 
can be established, and then the strength of linkage between markers and traits can be calculated 
and used to indicate the relative positions of markers and genes affecting those traits (Weir 
B.S.. 1996). The classical method for linkage analysis is the logarithm of odds (lod) score ' 
method (see Morton N.E., 1955; OttJ, ,991). Calculation of lod sco res requires specification of 
the mode of inheritance for the disease (parametric method). Generally, the length of the 
candidate region identified using linkage analysis is between 2 and 20Mb. Once a candidate 
reg,on is identified as described above, analysis of recombinant individuals using additional 
markers allows further delineation of the candidate region. Linkage analysis studies have 
generally relied on the use of a maximum of 5,000 microsatellite markers, thus limiting the 
maximum theoretical attainable resolution of linkage analysis to about 600 kb on average 

Linkage analysis has been successfully applied to map simple genetic traits that show 
clear Mendelian inheritance patterns and which have a high penetrance (i.e., the ratio between 
the number of trait positive carriers of allele a and the total number of a carriers in the 
population). However, parametric linkage analysis suffers from a variety of drawbacks First it 
.s hm.ted by its reliance on the choice of a genetic model suitable for each studied trait 
Furthermore, as already mentioned, the resolution attainable using linkage analysis is limited 
and complementary studies are required to refine the analysis of the typical 2Mb to 20Mb 
regions initially identified through linkage analysis. In addition, parametric linkage analysis 
approaches have proven difficult when applied to complex genetic traits, such as those due to 
the combined action of multiple genes and/or environmental factors. It is very difficult to 
model these factors adequately in a lod score analysis. In such cases, too large an effort and 
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cost are needed to recruit the adequate number of affected families required for applying linkage 
analysis to these situations, as recently discussed by Risch, N. and Merikangas, K. (1996). 
Non-parametric methods 

The advantage of the so-called non-parametric methods for linkage analysis is that they 
do not require specification of the mode of inheritance for the disease, they tend to be more 
useful for the analysis of complex traits. In non-parametric methods, one tries to prove that the 
inheritance pattern of a chromosomal region is not consistent with random Mendelian 
segregation by showing that affected relatives inherit identical copies of the region more often 
than expected by chance. Affected relatives should show excess "allele sharing" even in the 
presence of incomplete penetrance and polygenic inheritance. In non-parametric linkage 
analysis the degree of agreement at a marker locus in two individuals can be measured either by 
the number of alleles identical by state (IBS) or by the number of alleles identical by descent 
(IBD). Affected sib pair analysis is a well-known special case and is the simplest form of these 
methods. 

The biallelic markers of the present invention may be used in both parametric and non- 
parametric linkage analysis. Preferably biallelic markers may be used in non-parametric 
methods which allow the mapping of genes involved in complex traits. The biallelic markers of 
the present invention may be used in both IBD- and IBS- methods to map genes affecting a 
complex trait. In such studies, taking advantage of the high density of biallelic markers, several 
adjacent biallelic marker loci may be pooled to achieve the efficiency attained by multi-allelic 
. markers (Zhao et al., 1998). 

However, both parametric and non-parametric linkage analysis methods analyse 
affected relatives, they tend to be of limited value in the genetic analysis of drug responses or in 
the analysis of side effects to treatments. This type of analysis is impractical in such cases due 
to the lack of availability of familial cases. In fact, the likelihood of having more than one 
individual in a family being exposed to the same drug at the same time is extremely low. 

Population Association Studies 

The present invention comprises methods for identifying one or several genes among a 
set of candidate genes that are associated with a detectable trait using the biallelic markers of 
the present invention. In one embodiment the present invention comprises methods to detect an 
association between a biallelic marker allele or a biallelic marker haplotype and a trait. Further, 
the invention comprises methods to identify a trait causing allele in linkage disequilibrium with 
any biallelic marker allele of the present invention. 

As described above, alternative approaches can be employed to perform association 
studies: genome-wide association studies, candidate region association studies and candidate 
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gene association studies. The candidate region analysis clearly provides a short-cut approach to 
the .dentification of genes and gene polymorphisms related to a particular trait when some 
mformation concerning the biology ofthe trait is available. Further, the biallelic markers ofthe 
present mvention may be incorporated in any map of genetic markers ofthe human genome in 
order to perform genome-wide association studies. Methods to generate a high-density map of 
bmllel.c markers has been described in US Provisional Patent application serial number 
60/082,614. The biallelic markers ofthe present invention may further be incorporated in any 
map of a specific candidate region ofthe genome (a specific chromosome or a specific 
chromosomal segment for example). 

As mentioned above, association studies may be conducted within the general 
population and are not limited to studies performed on related individuals in affected families 
Assocation studies are extremely valuable as they permit the analysis of sporadic or multifactor 
tra.ts. Moreover, association studies represent a powerful method for fine-scale mapping 
enabling much finer mapping of trait causing alleles than linkage studies. Studies based on 
pedigrees often only narrow the location ofthe trait causing a.lele. Association studies using 
the b,al.elic markers ofthe present invention can therefore be used to refine the location of a 
tra,t causing allele in a candidate region identified by Linkage Analysis methods. Biallelic 
markers ofthe present invention can be used to identify the involved gene; such uses are 
specifically contemplated in the present invention and claims. 

1) Determining the frequency of a biallelic marker allele or of a biallelic marker 
haplotype in a population 

Another embodiment ofthe present invention encompasses methods of estimating the 
frequency of a haplotype for a set of biallelic markers in a population, comprising the steps of 
a) genotypmg each individual in said population for at least one 13q3 l-q33-related biallelic 
marker, b) genotyping each individual in said population for a second biallelic marker by 
determining the identity ofthe nucleotides at said second biallelic marker for both copies of said 
second b,allelic marker present in the genome; and c) applying a haplotype determination 
method to the identities ofthe nucleotides determined in steps a) and b) to obtain an estimate of 
sa,d frequency. In addition, the methods of estimating the frequency of a haplotype ofthe 
mvenfon encompass methods with any further limitation described in this disclosure or those 
followmg, specified alone or in any combination: optionally said haplotype determination 
method is selected from the group consisting of asymmetric PGR amplification, double PGR 
amplification of specific alleles, the Clark method, or an expectation maximization algorithm- 
opt,onally, said second biallelic marker is a 1 3q3 1 -q33-related biallelic marker in a sequence ' 
selected from the group consisting of SEQ ID Nos 1 to 26, 36 to 40 and 54 to 229, and the 
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complements thereof; optionally, said 13q3 l-q33-related biallelic marker may be selected 
individually or in any combination from the biallelic markers described in Tables 6b and 6c; 
optionally, the identity of the nucleotides at the biallelic markers in everyone of the sequences 
of SEQ ID Nos 1 to 26, 36 to 40 and 54 to 229 is determined in steps a) and b). 

Association studies explore the relationships among frequencies for sets of alleles 
between loci. 

Determining the frequency of an allele in a population 

Allelic frequencies of the biallelic markers in a population can be determined using one 
of the methods described above under the heading "Methods for genotyping an individual for 
biallelic markers", or any genotyping procedure suitable for this intended purpose. Genotyping 
pooled samples or individual samples can determine the frequency of a biallelic marker allele in 
a population. One way to reduce the number of genotypings required is to use pooled samples. 
A major obstacle in using pooled samples is in terms of accuracy and reproducibility for 
determining accurate DNA concentrations in setting up the pools. Genotyping individual 
samples provides higher sensitivity, reproducibility and accuracy and; is the preferred method 
used in the present invention. Preferably, each individual is genotyped separately and simple 
gene counting is applied to determine the frequency of an allele of a biallelic marker or of a 
genotype in a given population. 

Determining the frequency of a haplotype in a population 
The gametic phase of haplotypes is unknown when diploid individuals are 
heterozygous at more than one locus. Using genealogical information in families gametic phase 
can sometimes be inferred (Perlin et al., 1994). When no genealogical information is available 
different strategies may be used. One possibility is that the multiple-site heterozygous diploids 
can be eliminated from the analysis, keeping only the homozygotes and the single-site 
heterozygote individuals, but this approach might lead to a possible bias in the sample 
composition and the underestimation of low-frequency haplotypes. Another possibility is that 
single chromosomes can be studied independently, for example, by asymmetric PCR 
amplification (see Newton et al., 1989; Wu et al., 1989) or by isolation of single chromosome 
by limit dilution followed by PCR amplification (see Ruano et al., 1990). Further, a sample may 
be haplotyped for sufficiently close biallelic markers by double PCR amplification of specific 
alleles (Sarkar, G. and Sommer S.S., 1991). These approaches are not entirely satisfying either 
because of their technical complexity, the additional cost they entail, their lack of generalisation 
at a large scale, or the possible biases they introduce. To overcome these difficulties, an 
algorithm to infer the phase of PCR-amplified DNA genotypes introduced by Clark A.G. (1990) 
may be used. Briefly, the principle is to start filling a preliminary list of haplotypes present in 
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the sample by examining unambiguous individuals, that is, the complete homozygotes and the 
single-site heterozygotes. Then other individuals in the same sample are screened for the 
possible occurrence of previously recognised haplotypes. For each positive identification, the 
complementary haplotype is added to the list of recognised haplotypes, until the phase 
information for all individuals is either resolved or identified as unresolved. This method 
assigns a single haplotype to each multiheterozygous individual, whereas several haplotypes are 
poss.ble when there are more than one heterozygous site. Alternatively, one can use methods 
estimating haplotype frequencies in a population without assigning haplotypes to each 
individual. Preferably, a method based on an expectation-maximization (EM) algorithm 
(Dempster et a.., J. R , 977) leading to maximum-likelihood estimates of haplotype frequencies 
under the assumption of Hardy-Weinberg proportions (random mating) is used (see Excoffier L 
and Slatkin M, 1995). The EM algorithm is a generalised iterative maximum-likelihood 
approach to estimation that is useful when data are ambiguous and/or incomplete. The EM 
algorithm is used to resolve heterozygotes into haplotypes. Haplotype estimations are further 
described be.ow under the heading "Statistical methods». Any other method known in the art to 
determme or to estimate the frequency of a haplotype in a population may also be used. 
2) Linkage Disequilibrium analysis 

Linkage disequilibrium is the non-random association of alleles at two or more loci and 
represents a powerful tool for mapping genes involved in disease traits (see Ajioka R.S. et al. 
1997). Biallelic markers, because they are densely spaced in the human genome and can be ' 
genotyped in more numerous numbers than other types of genetic markers (such as RFLP or 
VNTR markers), are particularly useful in genetic analysis based on linkage disequilibrium 
The biallelic markers of the present invention may be used in any linkage disequilibrium 
analysis method known in the art. 

Briefly, when a disease mutation is first introduced into a population (by a new 
mutation or the immigration of a mutation carrier), it necessarily resides on a single 
chromosome and thus on a single "background" or "ancestral" haplotype of linked markers 
Consequently, there is complete disequilibrium between these markers and the disease 
mutation: one finds the disease mutation only in the presence of a specific set of marker alleles 
Through subsequent generations recombinations occur between the disease mutation and these 
marker polymorphisms, and the disequilibrium gradually dissipates. The pace of this 
dissipation is a function of the recombination frequency, so the markers closest to the disease 
gene will manifest higher levels of disequilibrium than those that are further away. When not 
broken up by recombination, "ancestral" haplotypes and linkage disequilibrium between marker 
alleles at different loci can be tracked not only through pedigrees but also through populations 
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Linkage disequilibrium is usually seen as an association between one specific allele at one locus 
and another specific allele at a second locus. 

The pattern or curve of disequilibrium between disease and marker loci is expected to 
exhibit a maximum that occurs at the disease locus. Consequently, the amount of linkage 
disequilibrium between a disease allele and closely linked genetic markers may yield valuable 
infornfetion regarding the location of the disease gene. For fine-scale mapping of a disease 
locus, it is useful to have some knowledge of the patterns of linkage disequilibrium that exist 
between markers in the studied region. As mentioned above the mapping resolution achieved 
through the analysis of linkage disequilibrium is much higher than that of linkage studies. The 
high density of biallelic markers combined with linkage disequilibrium analysis provides 
powerful tools for fine-scale mapping. Different methods to calculate linkage disequilibrium 
are described below under the heading "Statistical Methods". 

3) Population-based case-control studies of trait-marker associations 

As mentioned above, the occurrence of pairs of specific alleles at different loci on the 
same chromosome is not random and the deviation from random is called linkage 
disequilibrium. Association studies focus on population frequencies and rely on the 
phenomenon of linkage disequilibrium. If a specific allele in a given gene is directly involved 
in causing a particular trait, its frequency will be statistically increased in an affected (trait 
positive) population, when compared to the frequency in a trait negative population or in a 
random control population. As a consequence of the existence of linkage disequilibrium, the 
frequency of all other alleles present in the haplotype carrying the trait-causing allele will also 
be increased in trait positive individuals compared to trait negative individuals or random 
controls. Therefore, association between the trait and any allele (specifically a biallelic marker 
allele) in linkage disequilibrium with the trait-causing allele will suffice to suggest the presence 
of a trait-related gene in that particular region. Case-control populations can be genotyped for 
biallelic markers to identify associations that narrowly locate a trait causing allele. As any 
marker in linkage disequilibrium with one given marker associated with a trait will be 
associated with the trait. Linkage disequilibrium allows the relative frequencies in case-control 
populations of a limited number of genetic polymorphisms (specifically biallelic markers) to be 
analysed as an alternative to screening all possible functional polymorphisms in order to find 
trait-causing alleles. Association studies compare the frequency of marker alleles in unrelated 
case-control populations, and represent powerful tools for the dissection of complex traits. 

Case-control populations (inclusion criteria) 

Population-based association studies do not concern familial inheritance but compare 
the prevalence of a particular genetic marker, or a set of markers, in case-control populations. 
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They are case-control studies based on comparison of unrelated case (affected or trait positive) 
individuals and unrelated control (unaffected or trait negative or random) individuals. 
Preferably the control group is composed of unaffected or trait negative individuals. Further, 
the control group is ethnically matched to the case population. Moreover, the control group is 
preferably matched to the case-population for the main known confusion factor for the trait 
under study (for example age-matched for an age-dependent trait). Ideally, individuals in the 
two samples are paired in such a way that they are expected to differ only in their disease status 
In the following "trait positive population)), "case population" and "affected population" are 
used interchangeably. 

An important step in the dissection of complex traits using association studies is the 
choice of case-control populations (see Lander and Schork, 1 994). A major step in the choice 
of case-control populations is the clinical definition of a given trait or phenotype. Any genetic 
trait may be analysed by the association method proposed here by carefully selecting the 
individuals to be included in the trait positive and trait negative phenotypic groups. Four 
criteria are often useful: clinical phenotype, age at onset, family history and severity. The : 
selection procedure for continuous or quantitative traits (such as blood pressure for example) 
involves selecting individuals at opposite ends of the phenotype distribution of the trait under 
study, so as to include in these trait positive and trait negative populations individuals with non- 
overlapping phenotypes. Preferably, case-control populations comprise phenotypically 
homogeneous populations. Trait positive and trait negative populations comprise 
phenotypically uniform populations of individuals representing each between 1 and 98%, 
preferably between 1 and 80%, more preferably between 1 and S0%, and more preferably 
between 1 and 30%, most preferably between 1 and 20% of the total population under study 
and selected among individuals exhibiting non-overlapping phenotypes. The clearer the 
difference between the two trait phenotypes, the greater the probability of detecting an 
association with biallelic markers. The selection of those drastically different but relatively 
uniform phenotypes enables efficient comparisons in association studies and the possible 
detection of marked differences at the genetic level, provided that the sample sizes of the 
populations under study are significant enough. 

In preferred embodiments, a first group of between 50 and 300 trait positive 
individuals, preferably about 1 00 individuals, are recruited according to their phenotypes. A 
similar number of trait negative individuals are included in such studies. 

In the present invention, typical examples of inclusion criteria include affection by 
schizophrenia. 

Association airaaSysfis 
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The general strategy to perform association studies using biallelic markers derived from 
a region carrying a candidate gene is to scan two groups of individuals (case-control 
populations) in order to measure and statistically compare the allele frequencies of the biallelic 
markers of the present invention in both groups. 

If a statistically significant association with a trait is identified for at least one or more 
of the analysed biallelic markers, one can assume that: either the associated allele is directly 
responsible for causing the trait (the associated allele is the trait causing allele), or more likely 
the associated allele is in linkage disequilibrium with the trait causing allele. The specific 
characteristics of the associated allele with respect to the gene function usually gives further 
insight into the relationship between the associated allele and the trait (causal or in linkage 
disequilibrium). If the evidence indicates that the associated allele within the gene is most 
probably not the trait causing allele but is in linkage disequilibrium with the real trait causing 
allele, then the trait causing allele can be found by sequencing the vicinity of the associated 
marker. 

Another embodiment of the present invention encompasses methods of detecting an 
association between a haplotype and a phenotype, comprising the steps of: a) estimating the 
frequency of at least one haplotype in a trait positive population according to a method of 
estimating the frequency of a haplotype of the invention; b) estimating the frequency of said 
haplotype in a control population according to the method of estimating the frequency of a 
haplotype of the invention; and c) determining whether a statistically significant association 
exists between said haplotype and said phenotype. In addition, the methods of detecting an 
association between a haplotype and a phenotype of the invention encompass methods with any 
further limitation described in this disclosure, or those following, specified alone or in any 
combination: Optionally, said 13q3 l-q33-related biallelic marker may be in a sequence 
selected individually or in any combination from the group consisting of SEQ ID Nos 1 to 26, 
36 to 40 and 54 to 229, and the complements thereof; optionally, said 13q31-q3 3 -related 
biallelic marker may be selected individually or in any combination from the biallelic markers 
described in Tables 6b and 6c; optionally, said control population may be a trait negative 
population, or a random population; optionally, said phenotype is a disease involving 
schizophrenia, a response to an agent acting on schizophrenia, or a side effects to an agent 
acting on schizophrenia. 

Haplotype analysis 

As described above, when a chromosome carrying a disease allele first appears in a 
population as a result of either mutation or migration, the mutant allele necessarily resides on a 
chromosome having a set of linked markers: the ancestral haplotype. This haplotype can be 
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tracked through populations and its statistical association with a given trait can be analysed. 
Complementing single point (allelic) association studies with multi-point association studies 
also called haplotype studies increases the statistical power of association studies. Thus, a 
haplotype association study allows one to define the frequency and the type of the ancestral 
carrier haplotype. A haplotype analysis is important in that it increases the statistical power of 
an analysis involving individual markers. 

In a first stage of a haplotype frequency analysis, the frequency of the possible 
haplotypes based on various combinations of the identified biallelic markers of the invention is 
determined. The haplotype frequency is then compared for distinct populations of trait positive 
and control individuals. The number of trait positive individuals, which should be, subjected to 
this analysis to obtain statistically significant results usually ranges between 30 and 300, with a 
preferred number of individuals ranging between 50 and 150. The same considerations apply to 
the number of unaffected individuals (or random control) used in the study. The results of this 
first analysis provide haplotype frequencies in case-control populations, for each evaluated 
haplotype frequency a p-value and an odd ratio are calculated. If a statistically significant 
association is found the relative risk for an individual carrying the given haplotype of being 
affected with the trait under study can be approximated. 
Interaction Analysis 

The biallelic markers of the present invention may also be used to identify patterns of 
biallelic markers associated with detectable traits resulting from polygenic interactions. The 
analysis of genetic interaction between alleles at unlinked loci requires individual genotyping 
using the techniques described herein. The analysis of allelic interaction among a selected set 
of biallelic markers with appropriate level of statistical significance can be considered as a 
haplotype analysis. Interaction analysis comprises stratifying the case-control populations with 
respect to a given haplotype for the first loci and performing a haplotype analysis with the 
second loci with each subpopulation. 

Statistical methods used in association studies are further described herein. 

4) Testing for linkage in the presence of association 

The biallelic markers of the present invention may further be used in TDT 
(transmission/disequilibrium test). TDT tests for both linkage and association and is not 
affected by population stratification. TDT requires data for affected individuals and their 
parents or data from unaffected sibs instead of from parents (see Spielmann S. et al., 1993; 
SchaidDJ. eta!., 1996, Spielmann S. and Ewens W.J, 1998). Such combined tests generally 
reduce the false - positive errors produced by separate analyses. 



WO 00/585 1 0 PCT/IB00/00435 

125 

Statistical methods 

In general, any method known in the art to test whether a trait and a genotype show a 
statistically significant correlation may be used. 

1) Methods in linkage analysis 

Statistical methods and computer programs useful for linkage analysis are well-known 
to those skilled in the art (see Terwilliger J.D. and Ott J., 1994; Ott J., 1991). 

2) Methods to estimate haplotype frequencies in a population 

As described above, when genotypes are scored, it is often not possible to distinguish 
heterozygotes so that haplotype frequencies cannot be easily inferred. When the gametic phase 
is not known, haplotype frequencies can be estimated from the multilocus genotypic data. Any 
method known to person skilled in the art can be used to estimate haplotype frequencies (see 
Lange K., 1997; Weir, B.S., 1996) Preferably, maximum-likelihood haplotype frequencies are 
computed using an Expectation- Maximization (EM) algorithm (see Dempster et al., 1977; 
Excoffier L. and Slatkin M., 1995). This procedure is an iterative process aiming at obtaining 
maximum-likelihood estimates of haplotype frequencies from multi-locus genotype data when 
the gametic phase is unknown. Haplotype estimations are usually performed by applying the 
EM algorithm using for example the EM-HAPLO program (Hawley M.E. et al.,1994) or the 
Arlequin program (Schneider et al., 1997). The EM algorithm is a generalised iterative 
maximum likelihood approach to estimation and is briefly described below. 

In the following part of this text, phenotypes will refer to multi-locus genotypes with 
unknown phase. Genotypes will refer to known-phase multi-locus genotypes. Suppose a 
sample of N unrelated individuals typed for K markers. The data observed are the unknown- 
phase K-locus phenotypes that can categorised in F different phenotypes. Suppose that we have 
H underlying possible haplotypes (in case of K biallelic markers, H=2 K ). 

For phenotype j, suppose that Cj genotypes are possible. We thus have the following 
equation 

Pi = S pr {genotype ,)= T t pr(h k9 h i ) Equation 1 

i=l /=! 

where Pj is the probability of the phenotype j 3 /?*and hi are the two haplotypes constituent the 

genotype i. Under the Hardy- Weinberg equilibrium, prfh^hi) becomes : 

pr(h!cJ*i) = pr(h k ) 2 if h k = /i,, pr(h k ,h l ) = 2pr{h k ).pr{h l )\f h k * h, . Equation 2 
The successive steps of the E-M algorithm can be described as follows: 

Starting with initial values of the of haplotypes frequencies* noted p[ 0) , p< 0) , p { ^ 9 

these initial values serve to estimate the genotype frequencies (Expectation step) and then 
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estimate another set of haplotype frequencies (Maximisation step), noted p\ x \p"\ p™ 
these two steps are iterated until changes in the sets of haplotypes frequency are very small. 

A stop criterion can be that the maximum difference between haplotype frequencies 
between two iterations is less than 1 0" 7 . These values can be adjusted according to the desired 
precision of estimations. In details, at a given iteration s, the Expectation step comprises 
calculating the genotypes frequencies by the following equation: 

prigenotypei = pr{phenotype j).pr(genotype i\phenotype y )<*> 

= !H Pr(h k ,h,)^ Equation 3 

N ' pis) 
\/ 

where genotype / occurs in phenotypey, and where h k and h, constitute genotype i. Each 
probability is derived according to eq.l, and eq.2 described above. 

Then the Maximisation step simply estimates another set of haplotype frequencies 
given the genotypes frequencies. This approach is also known as gene-counting method (Smith 
1957). 

Pt =Y Z Z S lt .prigenotypei Equation 4 

Where S it is an indicator variable which count the number of time haplotype t in genotype i. It 
takes the values of 0, 1 or 2. 

To ensure that the estimation finally obtained is the maximum-likelihood estimation 
several values of departures are required. The estimations obtained are compared and if they are 
different the estimations leading to the best likelihood are kept. 
3) Methods to calculate linkage disequilibrium between markers 

A number of methods can be used to calculate linkage disequilibrium between any two 
genetic positions, in practice linkage disequilibrium is measured by applying a statistical 
association test to haplotype data taken from a population. Linkage disequilibrium between any 
pa,r of biallelic markers comprising at least one of the biallelic markers of the present invention 
(M„ Mj ) having alleles (a/bd at marker M, and alleles (a/bj) at marker Mj can be calculated for 
every allele combination (a„ aj ; ai ,b j; b i>aj and b.bj), according to the Piazza formula : 
A aiaj = V04 - V (94 + 93) (94 +92), where : 

94= - - = frequency of genotypes not having allele 3j at M, and not having allele aj at Mj 
93= - + = frequency of genotypes not having allele 3i at M, and having allele 3j at Mj 
92= + - = frequency of genotypes having allele a f at M s and not having allele aj at Mj 
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Linkage disequilibrium (LD) between pairs of biallelic markers (Mj, Mj) can also be calculated 
for every allele combination (ai,aj ; ai,bj : bj,aj and bj,bj), according to the maximum-likelihood 
estimate (MLE) for delta (the composite genotypic disequilibrium coefficient), as described by 
Weir (Weir B.S., 1996). The MLE for the composite linkage disequilibrium is: 
D ajai = (2n, + n 2 + n 3 + tu/2)/N - 2(pr(a i ).pr(a i )) 

where n| = £ phenotype (a;/a i5 aj/aj), n 2 = 2 phenotype (a^/a*, a/bj), n 3 = £ phenotype (a^/fy, aj/aj), 
n4= S phenotype (a/ty, a§/bj) and N is the number of individuals in the sample. This formula 
allows linkage disequilibrium between alleles to be estimated when only genotype, and not 
haplotype, data are available. 

Another means of calculating the linkage disequilibrium between markers is as follows. 
For a couple of biallelic markers, A4J (a/b,) and Mj (a/bj) 9 fitting the Hardy- Weinberg 
equilibrium, one can estimate the four possible haplotype frequencies in a given population 
according to the approach described above. 

The estimation of gametic disequilibrium between ai and aj is simply: 

D aiaj = pr(haplotype(ai y aj))-pr(aj ).pr (a j ). 

Where pr(a$ is the probability of allele a, and pv{aj) is the probability of allele a, and where 
prihaplotype (a u aj)) is estimated as in Equation 3 above. 

For a couple of biallelic marker only one measure of disequilibrium is necessary to 
describe the association between My and Mjl 
Then a normalised value of the above is calculated as follows: 

D' aiaj = D aiaj / max (-pr(aj).pr(aj) , -pr(b|).pr(bj)) with D aiaj <0 
D'aiaj = Daiaj / max (pr(bi),pr(aj) , pr(ai).pr(bj)) with D aia j>0 
The skilled person will readily appreciate that other LD calculation methods can be used 
without undue experimentation. 

Linkage disequilibrium among a set of biallelic markers having an adequate 
heterozygosity rate can be determined by genotyping between 50 and 1000 unrelated 
individuals, preferably between 75 and 200, more preferably around 100. 
4) Testing for association 

Methods for determining the statistical significance of a correlation between a 
phenotype and a genotype, in this case an allele at a biallelic marker or a haplotype made up of 
such alleles, may be determined by any statistical test known in the art and with any accepted 
threshold of statistical significance being required. The application of particular methods and 
thresholds of significance are well with in the skill of the ordinary practitioner of the art. 

Testing for association is performed by determining the frequency of a biallelic marker 
allele in case and control populations and comparing these frequencies with a statistical test to 
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determine if their is a statistically significant difference in frequency which would indicate a 
correlation between the trait and the biallelic marker allele under study. Similarly, a haplotype 
analysis is performed by estimating the frequencies of all possible haplotypes for a given set of 
biallelic markers in case and control populations, and comparing these frequencies with a 
statistical test to determine if their is a statistically significant correlation between the haplotype 
and the phenotype (trait) under study. Any statistical tool useful to test for a statistically 
significant association between a genotype and a phenotype may be used. Preferably the 
statistical test employed is a chi-square test with one degree of freedom. A P-value is calculated 
(the P-value is the probability that a statistic as large or larger than the observed one would 
occur by chance). 
Statistical significance 

In preferred embodiments, significance for diagnosis purposes, either as a positive basis 
for further diagnostic tests or as a preliminary starting point for early preventive therapy, the p 
value related to a biallelic marker association is preferably about 1 x 1 0' 2 or less, more 
preferably about 1 x 10" 4 or less, for a single biallelic marker analysis and about 1 x 10" 3 or less 
still more preferably 1 x 10 6 or less and most preferably of about 1 x 1 0 s or less, for a 
haplotype analysis involving several markers. These values are believed to be applicable to any 
association studies involving single or multiple marker combinations. 

The skilled person can use the range of values set forth above as a starting point in 
order to carry out association studies with biallelic markers of the present invention. In doing 
so, s.gnificant associations between the biallelic markers of the present invention and diseases 
involving schizophrenia can be revealed and used for diagnosis and drug screening purposes. 
Phenotypic permutation 

In order to confirm the statistical significance of the first stage haplotype analysis 
described above, it might be suitable to perform further analyses in which genotyping data from 
case-control individuals are pooled and randomised with respect to the trait phenotype Each 
individual genotyping data is randomly allocated to two groups, which contain the same number 
of md.v.duals as the case-control populations used to compile the data obtained in the first 
stage. A second stage haplotype analysis is preferably run on these artificial groups, preferably 
for the markers included in the haplotype of the first stage analysis showing the highest relative 
nsk coefficient. This experiment is reiterated preferably at least between 100 and 10000 times 
The repeated iterations allow the determination of the percentage of obtained haplotypes with a 
significant p-value level. 
Assessment of statistical association 



WO 00/58510 PCTYIBO0/00435 

129 

To address the problem of false positives similar analysis may be performed with the 
same case-control populations in random genomic regions. Results in random regions and the 
candidate region are compared as described in US Provisional Patent Application entitled 
"Methods, software and apparati for identifying genomic regions harbouring a gene associated 
5 with a detectable trait". 

5) Evaluation of risk factors 

The association between a risk factor (in genetic epidemiology the risk factor is the 
presence or the absence of a certain allele or haplotype at marker loci) and a disease is measured 
by the odds ratio (OR) and by the relative risk (RR). If P(R + ) is the probability of developing 
10 the disease for individuals with R and P(R") is the probability for individuals without the risk 

factor, then the relative risk is simply the ratio of the two probabilities, that is: 

RR= P(R + )/P(R-) 

In case-control studies, direct measures of the relative risk cannot be obtained because of the 
sampling design. However, the odds ratio allows a good approximation of the relative risk for 
15 low-incidence diseases and can be calculated: 

F + is the frequency of the exposure to the risk factor in cases and F" is the frequency of the 
exposure to the risk factor in controls. F* and F" are calculated using the allelic or haplotype 
frequencies of the study and further depend on the underlying genetic model (dominant, 

20 recessive, additive...). 

One can further estimate the attributable risk (AR) which describes the proportion of individuals 
in a population exhibiting a trait due to a given risk factor. This measure is important in 
quantitating the role of a specific factor in disease etiology and in terms of the public health 
impact of a risk factor. The public health relevance of this measure lies in estimating the 

25 proportion of cases of disease in the population that could be prevented if the exposure of 

interest were absent. AR is determined as follows: 

AR = P E (RR-1)/ (P E (RR-1)+1) 
AR is the risk attributable to a biallelic marker allele or a bialleiic marker haplotype. P E is the 
frequency of exposure to an allele or a haplotype within the population at large; and RR is the 

30 relative risk which, is approximated with the odds ratio when the trait under study has a 

relatively low incidence in the general population. 

AR is the risk attributable to a biallelic marker allele or a biallelic marker haplotype. 
PE is the frequency of exposure to an allele or a haplotype within the population at large; and 
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RR is the relative risk which, is approximated with the odds ratio when the trait under study has 
a relatively low incidence in the general population. 

Association of BiaUelic Markers of th e Invention with Schizophrenia 
In the context of the present invention, an association between chromosome I3q3 l- q33 
related biallelic markers, including Region D biallelic markers, and schizophrenia and bipo.ar 
d,sorder were established. Several association studies using different popu.ations and screening 
samples thereof, and with different sets of biallelic markers distributed on the chromosome 
13q3 l-q33 region and Region D thereof were carried out. Further details concerning these 
association studies and the results are provided herein in Examples 5a to Se. 

This information is extremely valuable. The knowledge of a potential genetic 
predisposition to schizophrenia, even if this predisposition is not absolute, might contribute in a 
very s.gmficant manner to treatment efficacy of schizophrenia and to the development of new 
therapeutic and diagnostic tools. 

Identification Of Biallelic Markers In Linkage Disequilibrium With The Biallelic 
Markers of the Invention 

Once a first biallelic marker has been identified in a genomic region of interest the 
practitioner of ordinary skill in the art, using the teachings of the present invention, can'easily 
identify additional biallelic markers in linkage disequilibrium with this first marker As 
mentioned before, any marker in linkage disequilibrium with a first marker associated with a 
tra,t will be associated with the trait. Therefore, once an association has been demonstrated 
between a given biallelic marker and a trait, the discovery of additional biallelic markers 
associated with this trait is of great interest in order to increase the density of biallelic markers 
m this particular region. The causa, gene or mutation will be found in the vicinity of the marker 
or set of markers showing the highest correlation with the trait. 

Identification of additional markers in linkage disequilibrium with a given marker 
•nvolves: (a) amplifying a genomic fragment comprising a first biallelic marker from a plurality 
of individuals; (b) identifying of second biallelic markers in the genomic region harboring said 
first biallelic marker; (c) conducting a linkage disequilibrium analysis between said first 
biallelic marker and second biallelic markers; and (d) selecting said second biallelic markers as 
be,ng ,n hnkage disequilibrium with said first marker. Subcombinations comprising steps (b) 
and (c) are also contemplated. 

Methods to identify biallelic markers and to conduct linkage disequilibrium analysis are 
descnbed herein and can be carried out by the skilled person without undue experimentation. 
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The present invention then also concerns biallelic markers and other polymorphisms which are 
in linkage disequilibrium with the specific biallelic markers of the invention and which are 
expected to present similar characteristics in terms of their respective association with a given 
trait. In a preferred embodiment, the invnetion concerns biallelic markers which are in linkage 
5 disequilibrium with the specific biallelic markers. 



Identification Of Functional Mutations 

Once a positive association is confirmed with a biallelic marker of the present 
invention, the associated candidate gene sequence can be scanned for mutations by comparing 
10 the sequences of a selected number of trait positive and trait negative individuals. In a preferred 

embodiment, functional regions such as exons and splice sites, promoters and other regulatory 
regions of the gene are scanned for mutations. Preferably, trait positive individuals carry the 
haplotype shown to be associated with the trait and trait negative individuals do not carry the 
haplotype or allele associated with the trait. The mutation detection procedure is essentially 
1 5 similar to that used for biallelic site identification. 

The method used to detect such mutations generally comprises the following steps: (a) 
amplification of a region of the candidate DNA sequence comprising a biallelic marker or a 
group of biallelic markers associated with the trait from DNA samples of trait positive patients 
and trait negative controls; (b) sequencing of the amplified region; (c) comparison of DNA 
20 sequences from trait-positive patients and trait-negative controls; and (d) determination of 

mutations specific to trait-positive patients. Subcombinations which comprise steps (b) and (c) : 
are specifically contemplated. 

It is preferred that candidate polymorphisms be then verified by screening a larger 
population of cases and controls by means of any genotyping procedure such as those described 
25 herein, preferably using a microsequencing technique in an individual test format. 

Polymorphisms are considered as candidate mutations when present in cases and controls at 
frequencies compatible with the expected association results. 

Candidate polymorphisms and mutations of the sbgl nucleic acid sequences suspected 
of being involved in a predisposition to schizophrenia can be confirmed by screening a larger 
30 population of affected and unaffected individuals using any of the genotyping procedures 

described herein. Preferably the microsequencing technique is used. Such polymorphisms are 
considered as candidate "trait-causing" mutations when they exhibit a statistically significant 
correlation with the detectable phenotype. 



35 



Biallelic Markers Of The Invention In Methods Of Genetic Diagnostics 
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The biallelic markers and other polymorphisms of the present invention can also be 
used to develop diagnostics tests capable of identifying individuals who express a detectable 
trait as the result of a specific genotype or individuals whose genotype places them at risk of 
developing a detectable trait at a subsequent time. The trait analyzed using the present 
diagnostics may be any detectable trait, including predisposition to schizophrenia, age of onset 
of detectable symptoms, a beneficial response to or side effects related to treatment against 
schizophrenia. Such a diganosis can be useful in the monitoring, prognosis and/or prophylactic 
or curative therapy for schizophrenia. 

The diagnostic techniques of the present invention may employ a variety of 
methodologies to determine whether a test subject has a genotype associated with an increased 
risk of developing a detectable trait or whether the individual suffers from a detectable trait as a 
result of a particular mutation, including methods which enable the analysis of individual 
chromosomes for haplotyping, such as family studies, single sperm DNA analysis or somatic 
hybrids. 

The diagnostic techniques concern the detection of specific alleles present within the 
human chromosome 13q31-q33 region; optionally within the Region D subregion; and 
optionally within an sbgl, g34665, sbg2, g35017 or g35018 nucleic acid sequence. More 
particularly, the invention concerns the detection of a nucleic acid comprising at least one of the 
nucleotide sequences of SEQ ID Nos. 1 to 26, 36 to 40 and 54 to 229 or a fragment thereof or a 
complementary sequence thereto including the polymorphic base. 

These methods involve obtaining a nucleic acid sample from the individual and, 
determining, whether the nucleic acid sample contains at least one allele or at least one biallelic 
marker haplotype, indicative of a risk of developing the trait or indicative that the individual 
expresses the trait as a result of possessing a particular the human chromosome 13q31-q33 
region, Region D, sbgl, g34665, sbg2, g35017 or g35018-related polymorphism or mutation 
(trait-causing allele). 

Preferably, in such diagnostic methods, a nucleic acid sample is obtained from the 
individual and this sample is genotyped using methods described above in "Methods Of 
Genotyping DNA Samples For Biallelic markers." The diagnostics may be based on a single 
biallelic marker or a on group of biallelic markers. 

In each of these methods, a nucleic acid sample is obtained from the test subject and the 
biallelic marker pattern of one or more of the biallelic markers of the invention is determined. 

In one embodiment, a PCR amplification is conducted on the nucleic acid sample to 
amplify regions in which polymorphisms associated with a detectable phenotype have been 
identified. The amplification products are sequenced to determine whether the individual 
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possesses one or more human chromosome 13q3 l-q33 region, Region D, sbgl, g34665, sbg2, 
g35017 or g35018-related polymorphisms associated with a detectable phenotype. The primers 
used to generate amplification products may comprise the primers listed in Table 6a. 
Alternatively, the nucleic acid sample is subjected to microsequencing reactions as described 
above to determine whether the individual possesses one or more human chromosome 1 3q3 1- 
q33 region-related polymorphisms associated with a detectable phenotype resulting from a 
mutation or a polymorphism in the human chromosome I3q3I-q33 region, Region D, sbgl, 
g34665, sbg2, g3501 7 or g3501 8-related biallelic marker. The primers used in the 
microsequencing reactions may include the primers listed in 6d. In another embodiment, the 
nucleic acid sample is contacted with one or more allele specific oligonucleotide probes which, 
specifically hybridize to one or more human chromosome 13q31-q33 region, Region D, sbgl, 
g34665, sbg2, g35017 or g3 501 8-related alleles associated with a detectable phenotype. The 
probes used in the hybridization assay may include the probes listed in Table 6c. In another 
embodiment, the nucleic acid sample is contacted with a second oligonucleotide capable of 
producing an amplification product when used with the allele specific oligonucleotide in an 
amplification reaction. The presence of an amplification product in the amplification reaction 
indicates that the individual possesses one or more human chromosome 1 3q3 l-q33 region, 
Region D, sbgl, g34665, sbg2, g35017 or g3501 8-related alleles associated with a detectable 
phenotype. 

In a preferred embodiment the identity of the nucleotide present at, at least one, biallelic 
marker selected from the group consisting of A 1 to A69, A71 to A74, A76 to A94, A96 to 
A106, A108 to Al 12, Al 14 to A177, A179 to A197, A199 to A222, A224 to A246, A250, 
A25 1 , A253, A255, A259, A266, A268 to A232, A328 to A360 and A361 to A489 and the 
complements thereof, is determined and the detectable trait is schizophrenia. Diagnostic kits 
comprise any of the polynucleotides of the present invention. 

These diagnostic methods are extremely valuable as they can, in certain circumstances, 
be used to initiate preventive treatments or to allow an individual carrying a significant 
haplotype to foresee warning signs such as minor symptoms. 

Diagnostics, which analyze and predict response to a drug or side effects to a drug, may 
be used to determine whether an individual should be treated with a particular drug. For 
example, if the diagnostic indicates a likelihood that an individual will respond positively to 
treatment with a particular drug, the drug may be administered to the individual. Conversely, if 
the diagnostic indicates that an individual is likely to respond negatively to treatment with a 
particular drug, an alternative course of treatment may be prescribed. A negative response may 
be defined as either the absence of an efficacious response or the presence of toxic side effects. 
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Clinical drug trials represent another application for the markers of the present 
invention. One or more markers indicative of response to an agent acting against schizophrenia 
or to s,de effects to an agent acting against schizophrenia may be identified using the methods 
described above. Thereafter, potential participants in clinical trials of such an agent may be 
screened to identify those individuals most likely to respond favorably to the drug and exclude 
those likely to experience side effects. In that way, the effectiveness of drug treatment may be 
measured in individuals who respond positively to the drug, without lowering the measurement 
as a result of the inclusion of individuals who are unlikely to respond positively in the study and 
without risking undesirable safety problems. 

PREVENTION, DIAGNOSIS AND TREATMENT OF PSYCHIATRIC DISEASE 

An aspect of the present invention relates to the preparation of a medicament for the 
treatment of psychiatric disease, in particular schizophrenia and bipolar disorder. The present 
invention embodies medicaments acting on sbgl , g34665, sbg2, g3501 7 or g3501 8. 

In preferred embodiments, medicaments of the invention act on sbgl, either directly or 
indirectly, by acting on the sbgl pathways. For example, the medicaments may modulate and 
more preferably decrease the level of sbgl activity which occurs in a cell or particular tissue or 
increase or descrease the activity of the sbgl protein. In certain embodiments, the invention'thus 
comprises use of a compound capable of increasing or decreasing sbgl expression or sbgl protein 
actmty ,n the preparation or manufacture of a medicament. Preferably, said compound is used for 
the treatment of a psychiatric disease, preferably for the treatment of schizophrenia or bipolar 
d,sorder. Preferably, said compound acts directly by binding to sbgl or an sbgl receptor. 

Such medicaments may also increase or decrease the activity of a compound analogous to 
sbgl, a compound comprising an amino acid sequence having at least 25% homology to a 
sequence selected from the group consisting of SEQ ID NOs. 27 to 35, a compound comprising an 
ammo acid sequence having at least 50% homology to a sequence selected from the group 
consisting of SEQ ID NOs. 27 to 35, and a compound comprising an amino acid sequence having 
at least 80% homology to a sequence selected from the group consisting of SEQ ID NOs 27 to 
35. 

Medicaments which increase or descrease the activity of these compounds in an 
individual may be used to ameliorate or prevent symptoms in individuals suffering from or 
predisposed to a psychiatric disease, as discussed above in the section entitled "indications". 

Alternatively, sbgl activity may be increased or decreasing by the expression of the genes 
encoding the identified sbgl -modulating compounds using gene therapy. Examples of vectors 
and promoters suitable for use in gene therapy are described above. Sbgl activity may also be 
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increased or decreased by preparing an antibody which binds to an sbgl peptide, an sbgl receptor 
or a protein related thereto, as well as fragments of these proteins. Such antibodies may modulate 
the interaction between sbgl and an sbgl receptor or a protein related thereto. Antibodies and 
methods of obtaining them are further described herein. 

As described above, the present invention provides cellular assays for identifying 
compounds for the treatment of psychiatric disease. The assays are based on detection of sbgl 
expression, measurement of sbgi protein activity, or based on the determination of other suitable 
schizophrenia, bipolar disorder or related psychiatric disease endpoints. Compounds for the 
treatment of psychiatric disease include derivative proteins or peptides which are capable of 
inhibiting the activity of a wild type sbgl protein, which may be identified by determining their 
ability to bind a wild type sbgl protein. Compounds also include antibodies, and small molecules 
and drugs which may be obtained using a variety of synthetic approaches familiar to those skilled 
in the art, including combinatorial chemistry based techniques. 

The invention further encompasses said methods for the prevention, treatment, and 
diagnosis of disease using any of the g34665, sbg2, g35017 or g3501 8 nucleic acids of proteins 
of the invention in analogous methods. 

Sbgl in Methods of Diagnosis or Detecting Predisposition 

Individuals affected by or predisposed to schizophrenia and bipolar disorder may express 
abnormal levels of sbgl, g34665, sbg2, g35017 or g3501 8. Individuals having increased or 
decreased sbgl, g34665, sbg2, g35017 or g3S018 activity in their plasma, body fluids, or body 
tissues may be at risk of devloping schizophrenia, bipolar disorder or a variety of potentially 
related psychiatric conditions. In one aspect of the present invention is a method for determining 
whether an individual is at risk of suffering from or is currently suffering from schizophrenia, 
bipolar disorder or other psychotic disorders, mood disorders, autism, substance dependence or 
alcoholism, mental retardation, or other psychiatric diseases including cognitive, anxiety, eating, 
impulse-control, and personality disorders, as defined with the Diagnosis and Statistical Manual of 
Mental Disorders fourth edition (DSM-IV) classification, comprising determining whether the 
individual has an abnormal level of sbgl activity in plasma, body fluids, or body tissues. The level 
of sbgl or analogous compounds in plasma, body fluids, or body tissues may be determined using 
a variety approaches. In particular, the level may be determined using ELISA, Western Blots, or 
protein electrophoresis. 

Biallelic Markers Of The Invention In Methods Of Genetic Diagnostics 
The biallelic markers and other polymorphisms of the present invention can also be 
used to develop diagnostics tests capable of identifying individuals who express a detectable 
trait as the result of a specific genotype or individuals whose genotype places them at risk of 
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developing a detectable trait at a subsequent time. The trait analyzed using the present 
diagnostics may be used to diagnose any detectable trait, including predisposition to 
schizophrenia or bipolar disorder, age of onset of detectable symptoms, a beneficial response to 
or side effects related to treatment against schizophrenia or bipolar disorder. Such a diagnosis 
can be useful in the monitoring, prognosis and/or prophylactic or curative therapy for 
schizophrenia or bipolar disorder. 

The diagnostic techniques of the present invention may employ a variety of 
methodologies to determine whether a test subject has a genotype associated with an increased 
nsk of developing a detectable trait or whether the individual suffers from a detectable trait as a 
result of a particular mutation, including methods which enable the analysis of individual 
chromosomes for haplotyping, such as family studies, single sperm DNA analysis or somatic 
hybrids. 

The diagnostic techniques concern the detection of specific alleles present within the 
human chromosome 1 3q3 1 -q33 region; optionally within the Region D subregion; and 
optionally within an sbgl, g34665, sbg2, g35017 or g 350l8 nucleic acid sequence'. More 
particularly, the invention concerns the detection of a nucleic acid comprising at least one of the 
nucleotide sequences of SEQ ID Nos. 1 to 26, 36 to 40 and 54 to 229 or a fragment thereof or a 
complementary sequence thereto including the polymorphic base. 

These methods involve obtaining a nucleic acid sample from the individual and 
determining, whether the nucleic acid sample contains at least one allele or at least one biallelic 
marker haplotype, indicative of a risk of developing the trait or indicative that the individual 
expresses the trait as a result of possessing a particular the human chromosome 13q3 l- q 33 
region-related polymorphism or mutation (trait-causing allele). 

Preferably, in such diagnostic methods, a nucleic acid sample is obtained from the 
individual and this sample is genotyped using methods described above in "Methods Of 
Genotyping DNA Samples For Biallelic markers." The diagnostics may be based on a single 
biallelic marker or a on group of biallelic markers. 

In each of these methods, a nucleic acid sample is obtained from the test subject and the 
b.allelic marker pattern of one or more of a biallelic marker of the invention is determined. 

In one embodiment, a PCR amplification is conducted on the nucleic acid sample to 
amplify regions in which polymorphisms associated with a detectable phenotype have been 
identified. The amplification products are sequenced to determine whether the individual 
possesses one or more human chromosome 13q31-q33 region, Region D, sbgl, g 34665, sbg2, 
g35017 or g3S018-re!ated polymorphisms associated with a detectable phenotype. The primers 
used to generate amplification products may comprise the primers listed in Table 6a. 
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Alternatively, the nucleic acid sample is subjected to microsequencing reactions as described 
above to determine whether the individual possesses one or more human chromosome 13q31- 
q33 region, Region D, sbgl, g34665, sbg2, g35017 or g3501 8-reIated polymorphisms 
associated with a detectable phenotype resulting from a mutation or a polymorphism in the 
human chromosome 13q31-q33 region. The primers used in the microsequencing reactions 
may include the primers listed in Table 6d. In another embodiment, the nucleic acid sample is 
contacted with one or more allele specific oligonucleotide probes which, specifically hybridize 
to one or more human chromosome 13q31-q33 region, Region D, sbgl, g34665, sbg2, g35017 
or g35018-related alleles associated with a detectable phenotype. The probes used in the 
hybridization assay may include the probes listed in 6b. In another embodiment, the nucleic 
acid sample is contacted with a second oligonucleotide capable of producing an amplification 
product when used with the allele specific oligonucleotide in an amplification reaction. The 
presence of an amplification product in the amplification reaction indicates that the individual 
possesses one or more human chromosome 13q31-q33 region, Region D, sbgl, g34665, sbg2, 
15 g35017 or g3501 8-related alleles associated with a detectable phenotype. In a preferred 

embodiment, the detectable trait is schizophrenia or bipolar disorder. Diagnostic kits comprise 
any of the polynucleotides of the present invention. 

These diagnostic methods are extremely valuable as they can, in certain circumstances, 
be used to initiate preventive treatments or to allow an individual carrying a significant 
20 haplotype to foresee warning signs such as minor symptoms. 

Diagnostics, which analyze and predict response to a drug or side effects to a drug, may 
be used to determine whether an individual should be treated with a particular drug. For 
example, if the diagnostic indicates a likelihood that an individual will respond positively to 
treatment with a particular drug, the drug may be administered to the individual. Conversely, if 
25 the diagnostic indicates that an individual is likely to respond negatively to treatment with a 

particular drug, an alternative course of treatment may be prescribed. A negative response may 
be defined as either the absence of an efficacious response or the presence of toxic side effects. 

Clinical drug trials represent another application for the markers of the present 
invention. One or more markers indicative of response to an agent acting against schizophrenia 
30 or to side effects to an agent acting against schizophrenia may be identified using the methods 

described above. Thereafter, potential participants in clinical trials of such an agent may be 
screened to identify those individuals most likely to respond favorably to the drug and exclude 
those likely to experience side effects. In that way, the effectiveness of drug treatment may be 
measured in individuals who respond positively to the drug, without lowering the measurement 
35 as a result of the inclusion of individuals who are unlikely to respond positively in the study and 
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without risking undesirable safety problems. 

Prevention And Treatment Of Disease Using Biallelic Markers 
In large part because of the risk of suicide, the detection of susceptibility to 
schizophrenia, bipolar disorder as well as other psychiatric disease in individuals is very 
important. Consequently, the invention concerns a method for the treatment of schizophrenia or 
bipolar disorder, or a related disorder comprising the following steps: 
- selecting an individual whose DNA comprises alleles of a biallelic marker or of a group of 
biallelic markers of the human chromosome 13q31-q33 region, preferably Region D-related 
markers, and more preferably sbgl, g34665, sbg2, g35017 or g35018-related markers associated 
with schizophrenia or bipolar disorder; 

- following up said individual for the appearance (and optionally the development) of 
the symptoms related to schizophrenia or bipolar disorder; and 

- administering a treatment acting against schizophrenia or bipolar disorder or against symptoms 
thereof to said individual at an appropriate stage of the disease. 

Another embodiment of the present invention comprises a method for the treatment of 
schizophrenia or bipolar disorder comprising the following steps: 

- selecting an individual whose DNA comprises alleles of a biallelic marker or of a 
group of biallelic markers, of the human chromosome 13q31-q33 region, preferably Region D- 
related markers, and more preferably sbgl, g34665, sbg2, g35017 or g350 1 8-related markers 
associated with schizophrenia or bipolar disorder; 

- administering a preventive treatment of schizophrenia or bipolar disorder to said individual. 

In a further embodiment, the present invention concerns a method for the treatment of 
schizophrenia or bipolar disorder comprising the following steps: 

- selecting an individual whose DNA comprises alleles of a biallelic marker or of a 
group of biallelic markers of the human chromosome 13q3 l-q33, preferably Region D-related 
markers, and more preferably sbgl, g3 4665, sbg2, g35017 or g3 501 8-related markers associated 
with schizophrenia or bipolar disorder; 

- administering a preventive treatment of schizophrenia or bipolar disorder to said 
individual; 

- following up said individual for the appearance and the development of schizophrenia 
or bipolar disorder symptoms; and optionally 

- administering a treatment acting against schizophrenia or bipolar disorder or against 
symptoms thereof to said individual at the appropriate stage of the disease. 

For use in the determination of the course of treatment of an individual suffering from 
disease, the present invention also concerns a method for the treatment of schizophrenia or 
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bipolar disorder comprising the following steps: 

- selecting an individual suffering from schizophrenia or bipolar disorder whose DNA 
comprises alleles of a biallelic marker or of a group of biallelic markers of the human 
chromosome 13q31-q33 region, preferably Region D-related markers, and preferably sbgl, 
g34665, sbg2, g3501 7 or g35018-related markers, associated with the gravity of schizophrenia 
or bipolar disorder or of the symptoms thereof; and 

- administering a treatment acting against schizophrenia or bipolar disorder or 
symptoms thereof to said individual. 

The invention also concerns a method for the treatment of schizophrenia or bipolar 
disorder in a selected population of individuals. The method comprises: 

- selecting an individual suffering from schizophrenia or bipolar disorder and whose 
DNA comprises alleles of a biallelic marker or of a group of biallelic markers of the human 
chromosome 13q3 l-q33 region, preferably Region D-related markers, and more preferably 
sbgl, g34665, sbg2, g35017 or g35018-related markers associated with a positive response to 
treatment with an effective amount of a medicament acting against schizophrenia or bipolar 
disorder or symptoms thereof, 

- and/or whose DNA does not comprise alleles of a biallelic marker or of a group of 
biallelic markers of the human chromosome 13q31-q33 region, preferably Region D-related 
markers, and more preferably sbgl, g34665, sbg2, g35017 or g35018-related markers associated 
with a negative response to treatment with said medicament; and 

- administering at suitable intervals an effective amount of said medicament to said 
selected individual. 

In the context of the present invention, a "positive response" to a medicament can be 
defined as comprising a reduction of the symptoms related to the disease. In the context of the 
present invention, a "negative response" to a medicament can be defined as comprising either a ■ 
lack of positive response to the medicament which does not lead to a symptom reduction or 
which leads to a side-effect observed following administration of the medicament. 

The invention also relates to a method of determining whether a subject is likely to 
respond positively to treatment with a medicament. The method comprises identifying a first 
population of individuals who respond positively to said medicament and a second population 
of individuals who respond negatively to said medicament. One or more biallelic markers is 
identified in the first population which is associated with a positive response to said medicament 
or one or more biallelic markers is identified in the second population which is associated with a 
negative response to said medicament. The biallelic markers may be identified using the 
techniques described herein. 
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A DNA sample is then obtained from the subject to be tested. The DNA sample is 
analyzed to determine whether it comprises alleles of one or more biallelic markers associated 
with a positive response to treatment with the medicament and/or alleles of one or more bialleli 
markers associated with a negative response to treatment with the medicament. 

In some embodiments, the medicament may be administered to the subject in a clinical 
trial if the DNA sample contains alleles of one or more biallelic markers associated with a 
positive response to treatment with the medicament and/or if the DNA sample lacks alleles of 
one or more biallelic markers associated with a negative response to treatment with the 
medicament. In preferred embodiments, the medicament is a drug acting against schizophrenia 
or bipolar disorder. 

Using the method of the present invention, the evaluation of drug efficacy may be 
conducted in a population of individuals likely to respond favorably to the medicament. 

Another aspect of the invention is a method of using a medicament comprising 
obtaining a DNA sample from a subject, determining whether the DNA sample contains alleles 
of one or more biallelic markers associated with a positive response to the medicament and/or 
whether the DNA sample contains alleles of one or more biallelic markers associated with a 
negative response to the medicament, and administering the medicament to the subject if the 
DNA sample contains alleles of one or more biallelic markers associated with a positive 
response to the medicament and/or if the DNA sample lacks alleles of one or more biallelic 
markers associated with a negative response to the medicament. 

The invention also concerns a method for the clinical testing of a medicament, 
preferably a medicament acting against schizophrenia or or bipolar disorder or symptoms 
thereof. The method comprises the following steps: 

- administering a medicament, preferably a medicament susceptible of acting against 
schizophrenia or or bipolar disorder or symptoms thereof to a heterogeneous population of 
individuals, 

- identifying a first population of individuals who respond positively to said 
medicament and a second population of individuals who respond negatively to said 
medicament, 

- identifying biallelic markers in said first population which are associated with a . 
positive response to said medicament, 

- selecting individuals whose DNA comprises biallelic markers associated with a 
positive response to said medicament, and 

- administering said medicament to said individuals. 

In any of the methods for the prevention, diagnosis and treatment of schizophrenia and 
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bipolar disorder, including methods of using a medicament, clinical testing of a medicament, 
determining whether a subject is likely to respond positively to treatment with a medicament, 
said biallelic marker may optionally comprise: 

(a) a biallelic marker selected from the group consisting of biallelic markers Al to 

A489; 

(b) a biallelic marker selected from the group consisting of biallelic markers Al to A69, 
A71 to A74, A76 to A94, A96 to A106, AI08 to Ai 12, Al 14 to A177, A179 to A197, A199 to 
A222, A224 to A242, A250 to A25 1, A259 , A269 to A270, A278, A285 to A295, A303 to 
A307, A330, A334 to A335 and A346 to 357; 

(c) a biallelic marker selected from the group consisting of biallelic markers Al to A69, 
A71 to A74, A76 to A94, A96 to A106, A108 to Al 12, Al 14 to A177, A179 to A197, A199 to 
A222, A224 to A246, A250, A251, A253, A255, A259, A266, A268 to A232 and A328 to 
A489; 

(d) a biallelic marker selected from the group consisting of sbgl -related markers A85 to 
A219, or more preferably a biallelic marker selected from the group consisting of sbgl -related 
markers A85 to A94, A96 to A106, A108 to Al 12, Al 14 to A177, A179 to A197 and A199 to 
A219; 

(e) a biallelic marker selected from the group consisting of g34665-related markers 
A230 to A236; 

(f) a biallelic marker selected from the group consisting of sbg2-related markers A79 to 

A99; 

(g) the g35017-related biallelic marker A41; 

i 

(h) a biallelic marker selected from the group consisting of g3501 8-related markers Al 

toA39; 

(i) a biallelic marker selected from the group consisting of A239, A227, A 198, A228, 
A223, A 107, A218, A270, A75, A62, A65 and A70; 

0) a biallelic marker selected from the group consisting of A48, A60, A61, A62, A65, ' 
A70, A75, A76, A80, A107, A108, A198, A218, A221, A223, A227, A228, A239, A285, 
A286, A287, A288, A290, A292, A293, A295,A299 and A304; 

(k) a biallelic marker selected from the group consisting of A304, A307, A305, A298, 
A292, A293, A291, A287, A286, A288, A289, A290, 99- A295 A299. A241, A239, A228, 
A227, A223, A221, A218, A198, A178, 99-24649/186 A108, A107, A80, A75, A70, A65, and 
A62; and/or 

(1) a biallelic marker selected from the group consisting of A304, A307, A305, A298, 
A292, A293, A291, A287, A286, A288, A289, A290, A295 A299, A241, A239, A228, A227, 
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A223,A22I,A218,A198,A178, A108, A107, A80, A76, A75, A70, A65, A62, A61, A60 
A48. 

Such methods are deemed to be extremely useful to increase the benefit/risk ratio 
resulting from the administration of medicaments which may cause undesirable side effects 
and/or be inefficacious to a portion of the patient population to which it is normally 
administered. 

Once an individual has been diagnosed as suffering from schizophrenia or bipolar 
disorder, selection tests are carried out to determine whether the DNA of this individual 
comprises alleles of a biallelic marker or of a group of biallelic markers associated with a 
positive response to treatment or with a negative response to treatment which may include either 
side effects or unresponsiveness. 

The selection of the patient to be treated using the method of the present invention can 
be earned out through the detection methods described above. The individuals which are to be 
selected are preferably those whose DNA does not comprise alleles of a biallelic marker or of a 
group of biallelic markers associated with a negative response to treatment. The knowledge of 
an individual's genetic predisposition to unresponsiveness or side effects to particular 
medicaments allows the clinician to direct treatment toward appropriate drugs against 
schizophrenia or bipolar disorder or symptoms thereof. 

Once the patient's genetic predispositions have been determined, the clinician can select 
appropriate treatment for which negative response, particularly side effects, has not been 
reported or has been reported only marginally for the patient. 

The biallelic markers of the invention have demonstrated an association with 
schizophrenia and bipolar disorders. However, the present invention also comprises any of the 
prevention, diagnostic, prognosis and treatment methods described herein using the biallelic 
markers of the invention in methods of preventing, diagnosing, managing and treating related 
d.sorders, particularly related CNS disorders. By way of example, related disorders may comprise 
psychot.c d,sorders, mood disorders, autism, substance dependence and alcoholism, mental 
retardation, and other psychiatric diseases including cognitive, anxiety, eating, impulse-control 
and personality disorders, as defined with the Diagnosis and Statistical Manual of Mental 
Disorders fourth edition (DSM-IV) classification". 
Recombinant Vectors 

The term "vector" is used herein to designate either a circular or a linear DNA or RNA 
molecule, which is either double-stranded or single-stranded, and which comprise at least one 
polynucleotide of interest that is sought to be transferred in a cell host or in a unicellular or 
multicellular host organism. 
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The present invention encompasses a family of recombinant vectors that comprise a 
polynucleotide derived from an sbgl, g34665, sbg2, g35017 or g35018 nucleic acid sequence. 
Consequently, the present invention further comprises recombinant vectors comprising: 

(a) sbgl genomic DNA or cDNAs comprised in the nucleic acids of any of nucleotide 
positions 215819 to 215941, 215819 to 215975, 216661 to 216952, 216661 to 217061, 217027 
to 2 1 706 1 , 229647 to 229742, 230408 to 23072 1 , 23 1 272 to 23 1 4 1 2, 23 1 787 to 23 1 880, 
231870 to 231879, 234174 to 234321, 237406 to 237428, 239719 to 239807, 239719 to 
239853, 240528 to 240569, 240528 to 240596, 240528 to 240617, 240528 to 240644, 240528 
to 240824, 240528 to 240994, 240528 to 241685 and 240800 to 240993 of SEQ ID No. 1, SEQ 
ID Nos 2 to 26 and primate sbgl DNAs of SEQ ID Nos 54 to 1 1 1 , and the complements 
thereof; 

(b) g34665 genomic DNA or cDNAs comprised in the nucleic acids of any of 
nucleotide positions 292653 to 292841, 295555 to 296047 and 295580 to 296047 of SEQ ID 
No. 1 , and the complements thereof; 

(c) sbg2 genomic DNA or cDNAs comprised in the nucleic acids of any of nucleotide 
positions 201 188 to 201234, 214676 to 214793, 21 5702 to 215746 and 216836 to 21691 5 of 
SEQ ID No. 1, and the complements thereof; 

(d) g35017 genomic DNA or cDNAs comprised in the nucleic acids of any of 
nucleotide positions 94 1 24 to 94964 of SEQ ID No. 1 , and the complements thereof; 

(e) g35018 genomic DNA or cDNAs comprised in the nucleic acids of any of 
nucleotide positions 1 108 to 1289, 14877 to 14920, 18778 to 18862, 25593 to 25740, 29388 to 
29502, 29967 to 30282, 64666 to 64812, and 65505 to 65853 of SEQ ID No. 1, and the 
complements thereof. 

Generally, a recombinant vector of the invention may comprise any of the 
polynucleotides described herein, as well as any sbgl, g34665, sbg2, g3S017.or g35018 primer 
or probe as defined above. 

In a first preferred embodiment, a recombinant vector of the invention is used to 
amplify the inserted polynucleotide derived from an sbgl , g34665, sbg2, g3501 7 or g3501 8 
genomic sequence or cDNA of the invention in a suitable cell host, this polynucleotide being 
amplified at every time that the recombinant vector replicates. 

A second preferred embodiment of the recombinant vectors according to the invention 
comprises expression vectors comprising either a regulatory polynucleotide or a coding nucleic 
acid of the invention, or both. Within certain embodiments, expression vectors are employed to 
express an sbgl, g34665, sbg2, g35017 or g3501 8 polypeptide which can be then purified and, 
for example be used in ligand screening assays or as an immunogen in order to raise specific 
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antibodies directed against an sbgl, g 34665, sb g 2, g3501 7 or g3501 8 protein. In other 
embodiments, the expression vectors are used for constructing transgenic animals and also for 
gene therapy. Expression requires that appropriate signals are provided in the vectors, said 
signals including various regulatory elements, such as enhancers/promoters from both viral and 
mammalian sources that drive expression of the genes of interest in host cells. Dominant drug 
selection markers for establishing permanent, stable cell clones expressing the products are 
generally included in the expression vectors of the invention, as they are elements that link 
expression of the drug selection markers to expression of the polypeptide. 

More particularly, the present invention relates to expression vectors which include 
nucleic acids encoding an sbgl, g34665, sbg2, g35017 or g35018 protein or variants or 
fragments thereof, under the control of a regulatory sequence of the respective sbgl, g34665, 
sb g 2, g3501 7 or g3501 8 regulatory polynucleotides, or alternatively under the control of an ' 
exogenous regulatory sequence. 

The invention also pertains to a recombinant expression vector useful for the expression 
of a sbgl, g 34665,sbg2,g35017 or g 35018cDNA sequence. x 

Recombinant vectors comprising a nucleic acid containing a human chromosome 
13q3 1-33-related biallelic marker, preferably a Region D-related biallelic marker or more 
preferably an sbgl-, g34665-, sbg2-, g35017- or g35018-related biallelic marker is also part of 
the invention. In a preferred embodiment, said biallelic marker is selected from the group 
consisting of Al to A489, and the complements thereof. 

Some of the elements which can be found in the vectors of the present invention are 
described in further detail in the following sections. 

1. General features of the expression vectors of the invention 

A recombinant vector according to the invention comprises, but is not limited to, a 
YAC (Yeast Artificial Chromosome), a BAC (Bacterial Artificial Chromosome), a phage, a 
phagemid, a cosmid, a plasmid or even a linear DNA molecule which may comprise a 
chromosomal, non-chromosomal, semi-synthetic and synthetic DNA. Such a recombinant 
vector can comprise a transcriptional unit comprising an assembly of: 

(1) a genetic element or elements having a regulatory role in gene expression, for 
example promoters or enhancers. Enhancers are cis-acting elements of DNA, usually from 
about 10 to 300 bp in length that act on the promoter to increase the transcription. 

(2) a structural or coding sequence which is transcribed into mRNA and eventually 
translated into a polypeptide, said structural or coding sequence being operably linked to the 
regulatory elements described in (1); and 

(3) appropriate transcription initiation and termination sequences. Structural units 
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intended for use in yeast or eukaryotic expression systems preferably include a leader sequence 
enabling extracellular secretion of translated protein by a host cell. Alternatively, when a 
recombinant protein is expressed without a leader or transport sequence, it may include a N- 
terminal residue. This residue may or may not be subsequently cleaved from the expressed 
recombinant protein to provide a final product. 

Generally, recombinant expression vectors will include origins of replication, selectable 
markers permitting transformation of the host cell, and a promoter derived from a highly 
expressed gene to direct transcription of a downstream structural sequence. The heterologous 
structural sequence is assembled in appropriate phase with translation initiation and termination 
sequences, and preferably a leader sequence capable of directing secretion of the translated 
protein into the periplastic space or the extracellular medium. In a specific embodiment 
wherein the vector is adapted for transfecting and expressing desired sequences in mammalian 
host cells, preferred vectors will comprise an origin of replication in the desired host, a suitable 
promoter and enhancer, and also any necessary ribosome binding sites, polyadenylation site, 
splice donor and acceptor sites, transcriptional termination sequences, and 5 f -flanking non- 
transcribed sequences. DNA sequences derived from the SV40 viral genome, for example 
SV40 origin, early promoter, enhancer, splice and polyadenylation sites may be used to provide 
the required non-transcribed genetic elements. 

The in vivo expression of an sbgl, g34665, sbg2, g350l 7 or g35018 polypeptide or 
fragments or variants thereof may be useful in order to correct a genetic defect related to the 
expression of the native gene in a host organism or to the production of a biologically inactive 
sbgl, g34665, sbg2, g35017 or g3 501 8 protein. 

Consequently, the present invention also comprises recombinant expression vectors 
mainly designed for the in vivo production of the sbgl, g34665, sbg2, g35017 or g35018 
polypeptide by the introduction of the appropriate genetic material in the organism of the patient 
to be treated. In preferred embodiments, said genetic material comprises at least one nucleotide 
sequence selected from the group of nucleotide posittion ranges consisting of: 

(a) sbgl genomic DNA or cDNAs comprised in the nucleic acids of any of nucleotide 
positions 215819 to 215941, 215819 to 215975, 216661 to 216952, 216661 to 217061, 217027 
to 217061, 229647 to 229742, 230408 to 230721, 231272 to 231412, 231787 to 23 1880, 
231870 to 23 1 879, 234174 to 234321, 237406 to 237428, 239719 to 239807, 239719 to 
239853, 240528 to 240569, 240528 to 240596, 240528 to 240617, 240528 to 240644, 240528 
to 240824, 240528 to 240994, 240528 to 241685 and 240800 to 240993 of SEQ ID No. 1, SEQ 
ID Nos 2 to 26 and primate sbgl DNAs of SEQ ID Nos. 54 to 1 1 1 , and the complements 
thereof; 
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(b) g34665 genomic DNA or cDNAs comprised in the nucleic acids of any of 
nucleotide positions 292653 to 292841, 295555 to 296047 and 295580 to 296047 of SEQ ID 
No. 1 , and the complements thereof; 

(c) sbg2 genomic DNA or cDNAs comprised in the nucleic acids of any of nucleotide 
positions 201 188 to 201234, 214676 to 214793, 215702 to 215746 and 216836 to 216915 of 
SEQ ID No. 1, and the complements thereof; 

(d) g3501 7 genomic DNA or cDNAs comprised in the nucleic acids of any of 
nucleotide positions 94124 to 94964 of SEQ ID No. 1, and the complements thereof; and 

(e) g3501 8 genomic DNA or cDNAs comprised in the nucleic acids of any of 
nucleotide positions 1108 to 1289, 14877 to 14920, 18778 to 18862, 25593 to 25740, 29388 to 
29502, 29967 to 30282, 64666 to 64812, and 65505 to 65853 of SEQ ID No. 1, and the 
complements thereof. 

This genetic material may be introduced in vitro in a cell that has been previously 
extracted from the organism, the modified cell being subsequently reintroduced in the said 
organism, directly in vivo into the appropriate tissue. 

2. Regulatory Elements 

Promoters 

The suitable promoter regions used in the expression vectors according to the present 
invention are chosen taking into account the cell host in which the heterologous gene has to be 
expressed. The particular promoter employed to control the expression of a nucleic acid 
sequence of interest is not believed to be important, so long as it is capable of directing the 
expression of the nucleic acid in the targeted cell. Thus, where a human cell is targeted, it is 
preferable to position the nucleic acid coding region adjacent to and under the control of a 
promoter that is capable of being expressed in a human cell, such as, for example, a human or a 
viral promoter. 

A suitable promoter may be heterologous with respect to the nucleic acid for which it 
controls the expression or alternatively can be endogenous to the native polynucleotide 
containing the coding sequence to be expressed. Additionally, the promoter is generally . 
heterologous with respect to the recombinant vector sequences within which the construct 
promoter/coding sequence has been inserted. 

Promoter regions can be selected from any desired gene using, for example, CAT 
(chloramphenicol transferase) vectors and more preferably pKK232-8 and pCM7 vectors. 

Preferred bacterial promoters are the Lad, LacZ, the T3 or T7 bacteriophage RNA 
polymerase promoters, the got, lambda PR, PL and trp promoters (EP 0036776), the polyhedrin 
promoter, or the plO protein promoter from baculovirus (Kit Novagen) (Smith et al., 1983; 
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O'Reilly et al., 1992), the lambda PR promoter or also the trc promoter. 

Eukaryotic promoters include CMV immediate early, HSV thymidine kinase, early and 
late SV40, LTRs from retrovirus, and mouse metallothionein-L. Selection of a convenient 
vector and promoter is well within the level of ordinary skill in the art. 
5 The choice of a promoter is well within the ability of a person skilled in the field of 

genetic engineering. For example, one may refer to the book of Sambrook et al.( 1 989) or also 
to the procedures described by Fuller et a!.(!996). 
Other regulatory elements 

One will typically desire to include a polyadenylation signal to effect proper 
10 polyadenylation of the gene transcript. The nature of the polyadenylation signal is not believed 

to be crucial to the successful practice of the invention, and any such sequence may be 
employed such as human growth hormone and SV40 polyadenylation signals. Also 
contemplated as an element of the expression cassette is a terminator. These elements can serve 
to enhance message levels and to minimize read through from the cassette into other sequences. 
1 5 The vector containing the appropriate DNA sequence as described above, more 

preferably an sbgl gene regulatory polynucleotide, a polynucleotide encoding an sbgl, g34665, 
sbg2, g35017 or g35018 polypeptide comprising at least one nucleotide sequence selected from 
the group of nucleotide sequence ranges consisting of: 

(a) sbgl genomic DNA or cDNAs comprised in the nucleic acids of any of nucleotide 
20 positions 215819 to 215941, 215819 to 215975, 216661 to 216952, 216661 to 217061, 217027 

to 217061, 229647 to 229742, 230408 to 230721, 231272 to 231412, 231787 to 231 880, 
231870 to 231879, 234174 to 234321, 237406 to 237428, 239719 to 239807, 239719 to 
239853, 240528 to 240569, 240528 to 240596, 240528 to 240617, 240528 to 240644, 240528 
to 240824, 240528 to 240994, 240528 to 241685 and 240800 to 240993 of SEQ ID No. 1, SEQ 
25 ID Nos 2 to 26 and primate sbgl DNAs or SEQ ID Nos. 54 to 1 1 1 , and the complements 

thereof; 

(b) g34665 genomic DNA or cDNAs comprised in the nucleic acids of any of 
nucleotide positions 292653 to 292841, 295555 to 296047 and 295580 to 296047 of SEQ ID 
No. 1 , and the complements thereof; 

30 ( c ) sbg2 genomic DNA or cDNAs comprised in the nucleic acids of any of nucleotide 

positions 201 188 to 201234, 214676 to 214793, 215702 to 215746 and 216836 to 216915 of 
SEQ ID No. 1 , and the complements thereof; 

(d) g3501 7 genomic DNA or cDNAs comprised in the nucleic acids of any of 
nucleotide positions 94124 to 94964 of SEQ ID No. 1, and the complements thereof; 

35 (e) g35018 genomic DNA or cDNAs comprised in the nucleic acids of any of 
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nucleotide positions 1 108 to 1289, 14877 to 14920, 18778 to 18862, 25593 to 25740 29388 to 
29502, 29967 to 30282, 64666 to 64812, and 65505 to 65853 of SEQ ID No. 1, and the 
complements thereof. 

3. Selectable Markers 

Such markers would confer an identifiable change to the cell permitting easy 
identification of cells containing the expression construct. The selectable marker genes for 
selection of transformed host cells are preferably dihydrofolate reductase or neomycin 
resistance for eukaryotic cell culture, TRP1 for S. cerevisiae or tetracycline, rifampicin or 
ampicillin resistance in E. coli, or levan saccharase for mycobacteria, this latter marker being a 
negative selection marker. 

4. Preferred Vectors. 
Bacterial vectors 

As a representative but non-limiting example, useful expression vectors for bacterial 
use can comprise a selectable marker and a bacterial origin of replication derived from 
commercially available plasmids comprising genetic elements of pBR322 (ATCC 37017) Such 
commercial vectors include, for example, pKK223-3 (Pharmacia, Uppsala, Sweden), and 
GEM 1 (Promega Biotec, Madison, WI, USA). 

Large numbers of other suitable vectors are known to those of skill in the art, and 
commercially available, such as the following bacterial vectors: pQE70, pQE60, pQE-9 
(Qiagen), pbs, P D10, phagescript, psiX174, pbluescript SK, pbsks, P NH8A, pNH16A, 
pNHl 8A, P NH46A (Stratagene); ptrc99a, pKK223-3, pKK233-3, pDR540, pRIT5 
(Pharmacia); pWLNEO, P SV2CAT, pOG44, pXTl, pSG (Stratagene); pSVK3, pBPV, pMSG 
pSVL (Pharmacia); pQE-30 (QIAexpress). 
Bacteriophage vectors 

The PI bacteriophage vector may contain large inserts ranging from about 80 to about 

100 kb. 

The construction of PI bacteriophage vectors such as pl58 or P 158/neo8 are notably 
described by Sternberg^ 1992, 1994). Recombinant PI clones comprising sbgl polynucleotide 
sequences may be designed for inserting large polynucleotides of more than 40 kb (Linton et al 
1 993). To generate PI DN A for transgenic experiments, a preferred protocol is the protocol 
described by McCormick et al.(1994). Briefly, E. coli (preferably strain NS3529) harboring the 
PI plasmid are grown overnight in a suitable broth medium containing 25 ug/ml of kanamycin 
The PI DNA is prepared from the E. coli by alkaline lysis using the Qiagen Plasmid Maxi kit 
(Q-agen, Chatsworth, CA, USA), according to the manufacturer's instructions. The PI DNA is 
purified from the bacterial lysate on two Qiagen-tip 500 columns, using the washing and elution 
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buffers contained in the kit. A phenol/chloroform extraction is then performed before 
precipitating the DNA with 70% ethanol. After solubilizing the DNA in TE (10 mM Tris-HCI, 
pH 7.4, 1 mM EDTA), the concentration of the DNA is assessed by spectrophotometry. 

When the goal is to express a PI clone comprising an sbgl polynucleotide sequence in 
5 a transgenic animal, typically in transgenic mice, it is desirable to remove vector sequences 

from the PI DNA fragment, for example by cleaving the PI DNA at rare-cutting sites within the 
PI polylinker (Sf:l, Noil or Sail). The PI insert is then purified from vector sequences on a 
pulsed-field agarose gel, using methods similar using methods similar to those originally 
reported for the isolation of DNA from YACs (Schedl et al., 1993a; Peterson et al., 1993, ). At 

1 0 this stage, the resulting purified insert DNA can be concentrated, if necessary, on a Millipore 

Ultrafree-MC Filter Unit (Millipore, Bedford, MA, USA - 30,000 molecular weight limit) and 
then dialyzed against microinjection buffer (10 mM Tris-HCI, pH 7.4; 250 ^M EDTA) 
containing 100 mM NaCI, 30 \xM spermine, 70 \xM spermidine on a microdyalisis membrane 
(type VS, 0.025 ^M from Millipore). The intactness of the purified PI DNA insert is assessed 

15 by electrophoresis on 1% agarose (Sea Kern GTG; FMC Bio-products) pulse-field gel and 

staining with ethidium bromide. 
Baculovirus vectors 

A suitable vector for the expression of an sbgl polypeptide encoded by polynucleotides 
of SEQ ID No. 1 or fragments or variants thereof is a baculovirus vector that can be propagated 
20 in insect cells and in insect cell lines. A specific suitable host vector system is the 

pVLl 392/1 393 baculovirus transfer vector (Pharmingen) that is used to transfect the SF9 cell 
line (ATCC N°CRL 171 1) which is derived from Spodoptera frugiperda. 

Other suitable vectors for the expression of the sbgl polypeptide encoded by the SEQ 
ID No. 1 or fragments or variants thereof in a baculovirus expression system include those 
25 described by Chai et al.(1993), Vlasak et al.(1983) and Lenhard et al.(1996). 

Viral vectors 

In one specific embodiment, the vector is derived from an adenovirus. Preferred 
adenovirus vectors according to the invention are those described by Feldman and Steg (1996) 
or Ohno et al.(1994). Another preferred recombinant adenovirus according to this specific 
30 embodiment of the present invention is the human adenovirus type 2 or 5 (Ad 2 or Ad 5) or an 

adenovirus of animal origin (French patent application N° FR-93.05954). 

Retrovirus vectors and adeno-associated virus vectors are generally understood to be 
the recombinant gene delivery systems of choice for the transfer of exogenous polynucleotides 
w vivo, particularly to mammals, including humans. These vectors provide efficient delivery of 
35 genes into cells, and the transferred nucleic acids are stably integrated into the chromosomal 
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DNA of the host. 

Particularly preferred retroviruses for the preparation or construction of retroviral in 
vitro or in vitro gene delivery vehicles of the present invention include retroviruses selected 
from the group consisting of Mink-Cell Focus Inducing Virus, Murine Sarcoma Virus, 
Reticuloendotheliosis virus and Rous Sarcoma virus. Particularly preferred Murine Leukemia 
V,ruses include the 4070A and the 1504A viruses, Abelson (ATCC No VR-999), Friend 
(ATCC No VR-245), Gross (ATCC No VR-590), Rauscher (ATCC No VR-998) and Moloney 
Murine Leukemia Vims (ATCC No VR-190; PCT Application No WO 94/24298). Particularly 
preferred Rous Sarcoma Viruses include Bryan high titer (ATCC Nos VR-334, VR-657 VR- 
726, VR-659 and VR-728). Other preferred retroviral vectors are those described in Rom et 
al.(1996), PCT Application No WO 93/25234, PCT Application No WO 94/ 06920, Roux et al 
1989,Julanetal., 1992 and Neda et al., 1991. 

Yet another viral vector system that is contemplated by the invention comprises the 
adeno-associated virus (AAV). The adeno-associated virus is a naturally occurring defective 
virus that requires another virus, such as an adenovirus or a herpes virus, as a helper virus for 
efficient replication and a productive life cycle (Muzyczka et al., 1992). It is also one of the 
few viruses that may integrate its DNA into non-dividing cells, and exhibits a high frequency of 
stable ,ntegration(FIotteetal., 1992; Samulski et al., 1989; McLaughlin etal., 1989) One 
advantageous feature of AAV derives from its reduced efficacy for transducing primary cells 
relative to transformed cells. 
BAC vectors 

The bacterial artificial chromosome (BAC) cloning system (Shizuya et al., 1992) has 
been developed to stably maintain large fragments of genomic DNA (100-300 kb) in E. coli A 
preferred BAC vector comprises pBeloBACl 1 vector that has been described by Kim et 
al.(1996). BAC libraries are prepared with this vector using size-selected genomic DNA that 
has been partially digested using enzymes that permit ligation into either the Bam HI or Hindlll 
sites in the vector. Flanking these cloning sites are T7 and SP6 RNA polymerase transcription 
m,t.at.on sites that can be used to generate end probes by either RNA transcription or PCR 
methods. After the construction of a BAC library in E coli, BAC DNA is purified from the 
host cell as a supercoiled circle. Converting these circular molecules into a linear form precedes 
both size determination and introduction of the BACs into recipient cells. The cloning site is 
flanked by two Not I sites, permitting cloned segments to be excised from the vector by Not I 
digestion. Alternatively, the DNA insert contained in the pBeloBACl 1 vector may be 
linearized by treatment of the BAC vector with the commercially available enzyme lambda 
terminase that leads to the cleavage at the unique co.N site, but this cleavage method results in a 



WO 00/58510 PCT/IB00/00435 

I5l 

full length BAC clone containing both the insert DNA and the BAG sequences. 
5. Delivery Of The Recombinant Vectors 

In order to effect expression of the polynucleotides and polynucleotide constructs of the 
invention, these constructs must be delivered into a cell. This delivery may be accomplished in 
vitro, as in laboratory procedures for transforming cell lines, or in vivo or ex vivo, as in the 
treatment of certain diseases states. 

One mechanism is viral infection where the expression construct is encapsulated in an 
infectious viral particle. 

Several non-viral methods for the transfer of polynucleotides into cultured mammalian 
cells are also contemplated by the present invention, and include, without being limited to, 
calcium phosphate precipitation (Graham et al., 1973; Chen et al., 1987), DEAE-dextran 
(Gopal, 1985), electroporation (Tur-Kaspa et al., 1986; Potter et ah, 1984), direct 
microinjection (Harland et al., 1985), DNA-loaded liposomes (Nicolau et ah, 1982; Fraley et 
al., 1979), and receptor-mediate transfection (Wu and Wu, 1987; 1988). Some of these 
techniques may be successfully adapted for in vivo or ex vivo use. 

Once the expression polynucleotide has been delivered into the cell, it may be stably 
integrated into the genome of the recipient cell. This integration may be in the cognate location 
and orientation via homologous recombination (gene replacement) or it may be integrated in a 
random, non specific location (gene augmentation). In yet further embodiments, the nucleic 
acid may be stably maintained in the cell as a separate, episomal segment of DNA. Such 
nucleic acid segments or "episomes" encode sequences sufficient to permit maintenance and 
replication independent of or in synchronization with the host cell cycle. 

One specific embodiment for a method for delivering a protein or peptide to the interior 
of a cell of a vertebrate in vivo comprises the step of introducing a preparation comprising a 
physiologically acceptable carrier and a naked polynucleotide operatively coding for the 
polypeptide of interest into the interstitial space of a tissue comprising the cell, whereby the 
naked polynucleotide is taken up into the interior of the cell and has a physiological effect. This 
is particularly applicable for transfer in vitro but it may be applied to in vivo as well. 

Compositions for use in vitro and in vivo comprising a "naked" polynucleotide are 
described in PCT application N° WO 90/1 1092 (Vical Inc.) and also in PCT application No. 
WO 95/1 1307 (Institut Pasteur, INSERM, Universite d'Ottawa) as well as in the articles of 
Tacson et al.(1996) and of Huygen et al.(1996). 

In still another embodiment of the invention, the transfer of a naked polynucleotide of 
the invention, including a polynucleotide construct of the invention, into cells may be proceeded 
with a particle bombardment (biolistic), said particles being DNA-coated microprojectiles 
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accelerated to a high velocity allowing them to pierce cell membranes and enter cells without 
killing them, such as described by Klein et al.(1987). 

In a further embodiment, the polynucleotide of the invention may be entrapped in a 
liposome (Ghosh and Bacchawat, 1991; Wongetal., 1980; Nicolau et al., 1987). 

In a specific embodiment, the invention provides a composition for the in vivo 
production of the sbgl, g34665, sb g 2, g3501 7 and g3501 8 protein or polypeptide described 
herein. It comprises a naked polynucleotide operatively coding for this polypeptide, in solution 
in a physiologically acceptable carrier, and suitable for introduction into a tissue to cause cells 
of the tissue to express the said protein or polypeptide. 

The amount of vector to be injected to the desired host organism varies according to the 
site of injection. As an indicative dose, it will be injected between 0,1 and 100 ug of the vector 
in an animal body, preferably a mammal body, for example a mouse body. 

In another embodiment of the vector according to the invention, it may be introduced in 
vitro in a host cell, preferably in a host cell previously harvested from the animal to be treated 
and more preferably a somatic cell such as a muscle cell. In a subsequent step, the cell that has 
been transformed with the vector coding for the desired sbgl polypeptide or the desired 
fragment thereof is reintroduced into the animal body in order to deliver the recombinant 
protein within the body either locally or systemically. 
Cell Hosts 

Another object of the invention comprises a host cell that have been transformed or 
transfected with one of the polynucleotides described herein, and more precisely a 
polynucleotide comprising an sbgl polynucleotide selected from the group consisting of SEQ 
ID Ncs. 1 to 26, 36 to 40 and 54 to 229, or a fragment or a variant thereof. Are included host 
cells that are transformed (prokaryotic cells) or that are transfected (eukaryotic cells) with a 
recombinant vector such as one of those described above. 

Generally, a recombinant host cell of the invention comprises any one of the 
polynucleotides or the recombinant vectors described therein. 

Preferred host cells used as recipients for the expression vectors of the invention are the 
following: 

a) Prokaryotic host cells: Escherichia coli strains (I.E.DHS-cc strain), Bacillus subtilis, 
Salmonella typhimurium, and strains from species like Pseudomonas, Streptomyces and 
Staphylococcus. 

b) Eukaryotic host cells: HeLa cells (ATCC N°CCL2; N°CCL2.1; N°CCL2.2) Cv 1 
cells (ATCC N°CCL70), COS cells (ATCC N°CRL1 650; N°CRL1651), Sf-9 cells (ATCC 
N°CRLI 71 1 ), C127 cells (ATCC N° CRL-1 804), 3T3 (ATCC N° CRL-636 1 ), CHO (ATCC N° 
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CCL-61), human kidney 293. (ATCC N° 45504; N° CRL-1 573) and BHK (ECACC N° 
84100501; N° 841 11301). 

c) Other mammalian host cells. 

Sbgl, g34665, sbg2, g35017 and g3501 8 gene expression in mammalian, and typically 
5 human, cells may be rendered defective with the replacement of an sbgl nucleic acid 

counterpart in the genome of an animal cell by an sbgl polynucleotide according to the 
invention. These genetic alterations may be generated by homologous recombination events 
using specific DNA constructs that have been previously described. 

One kind of cell hosts that may be used are mammal zygotes, such as murine zygotes. 
1 0 For example, murine zygotes may undergo microinjection with a purified DNA molecule of 

interest, for example a purified DNA molecule that has previously been adjusted to a 
concentration range from 1 ng/ml -for BAC inserts- 3 ng/\xl -for PI bacteriophage inserts- in 10 
mM Tris-HCI, pH 7.4, 250 nM EDTA containing 100 mM NaCI, 30 \iM spermine, and70 nM 
spermidine. When the DNA to be microinjected has a large size, polyamines and high salt 
1 5 concentrations can be used in order to avoid mechanical breakage of this DNA, as described by 

Schedletal (1993b). 

Any of the polynucleotides of the invention, including the DNA constructs described 
herein, may be introduced in an embryonic stem (ES) cell line, preferably a mouse ES cell line. 
ES cell lines are derived from pluripotent, uncommitted cells of the inner cell mass of pre- 

20 implantation blastocysts. Preferred ES cell lines are the following: ES-E14TG2a (ATCC n° 

CRL-1 821), ES-D3 (ATCC n° CRL1934 and n° CRL-1 1632), YS001 (ATCC n° CRL-1 1776), 
36.5 (ATCC n° CRL-l 1116). To maintain ES cells in an uncommitted state, they are cultured 
in the presence of growth inhibited feeder cells which provide the appropriate signals to 
preserve this embryonic phenotype and serve as a matrix for ES cell adherence. Preferred 

25 feeder cells are primary embryonic fibroblasts that are established from tissue of day 13- day 14 

embryos of virtually any mouse strain, that are maintained in culture, such as described by 
Abbondanzo et al.(1993) and are inhibited in growth by irradiation, such as described by 
Robertson (1987), or by the presence of an inhibitory concentration of LIF, such as described by 
Pease and Williams (1990). 

30 The constructs in the host cells can be used in a conventional manner to produce the 

gene product encoded by the recombinant sequence. 

Following transformation of a suitable host and growth of the host to an appropriate cell 
density, the selected promoter is induced by appropriate means, such as temperature shift or 
chemical induction, and cells are cultivated for an additional period. 

35 Cells are typically harvested by centrifugation, disrupted by physical or chemical 
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means, and the resulting crude extract retained for further purification. 

Microbial cells employed in the expression of proteins can be disrupted by any 
convenient method, including freeze-thaw cycling, sonication, mechanical disruption, or use of 
cell lysing agents. Such methods are well known by the skill artisan. 

Transgenic Animals 

The terms "transgenic animals" or "host animals" are used herein designate animals that 
have their genome genetically and artificially manipulated so as to include one of the nucleic 
acids according to the invention. Preferred animals are non-human mammals and include those 
belonging to a genus selected from Mus (e.g. mice), Rattus (e.g. rats) and Oryctogalus (e.g. 
rabbits) which have their genome artificially and genetically altered by the insertion of a 
nucleic acid according to the invention. In one embodiment, the invention encompasses non- 
human host mammals and animals comprising a recombinant vector of the invention or an sbgl, 
g34665, sbg2, g35017 or g35018 gene disrupted by homologous recombination with a knock 
out vector. The invention also encompasses non-human primates comprising a recombinant 
vector of the invention or an sbgl, g34665, sb g 2, g35017 or g35018 gene disrupted by 
homologous recombination with a knock out vector. 

The transgenic animals of the invention all include within a plurality of their cells a 
cloned recombinant or synthetic DNA sequence, more specifically one of the purified or 
isolated nucleic acids comprising an sbgl, g34665, sbg2, g3501 7 or g35018 polynucleotide or a 
DNA sequence encoding an antisense polynucleotide such as described in the present 
specification. 

Generally, a transgenic animal according the present invention comprises any one of the 
polynucleotides, the recombinant vectors and the cell hosts described in the present invention. 

In a first preferred embodiment, these transgenic animals may be good experimental 
models in order to study the diverse pathologies related to cell differentiation, in particular 
concerning the transgenic animals within the genome of which has been inserted one or several 
copies of a polynucleotide encoding a native sbgl , g34665, sbg2, g35017 or g3501 8 protein, or 
alternatively a mutant sbgl, g34665, sbg2, g35017 or g3501 8 protein. 

In a second preferred embodiment, these transgenic animals may express a desired 
polypeptide of interest under the control of regulatory polynucleotides which lead to good 
yields in the synthesis of this protein of interest, and optionally a tissue specific expression of 
this protein of interest. 

The design of the transgenic animals of the invention may be made according to the 
conventional techniques well known from the one skilled in the art. For more details regarding 
the production of transgenic animals, and specifically transgenic mice, it may be referred to US 
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Patents Nos 4,873,191, issued Oct. 10, 1989; 5,464,764 issued Nov 7, 1995; and 5,789,215, 
issued Aug 4, 1998. 

Transgenic animals of the present invention are produced by the application of 
procedures which result in an animal with a genome that has incorporated exogenous genetic 
material. The procedure involves obtaining the genetic material, or a portion thereof, which 
encodes either an sbgl, g34665, sbg2, g35017 or g35018 polynucleotide or antisense 
polynucleotide such as described in the present specification. 

A recombinant polynucleotide of the invention is inserted into an embryonic or ES stem 
cell line. The insertion is preferably made using electroporation, such as described by Thomas 
et al.(1987). The cells subjected to electroporation are screened (e.g. by selection via selectable 
markers, by PCR or by Southern blot analysis) to find positive cells which have integrated the 
exogenous recombinant polynucleotide into their genome, preferably via an homologous 
recombination event. An illustrative positive-negative selection procedure that may be used 
according to the invention is described by Mansour et al.(1988). 
15 Then, the positive cells are isolated, cloned and injected into 3.5 days old blastocysts 

from mice, such as described by Bradley (1987). The blastocysts are then inserted into a female 
host animal and allowed to grow to term. 

Alternatively, the positive ES cells are brought into contact with embryos at the 2.5 
days old 8-16 cell stage (morulae) such as described by Wood et al.(1993) or by Nagy et 
20 al.(l 993), the ES cells being internalized to colonize extensively the blastocyst including the 

cells which will give rise to the germ line. 

The offspring of the female host are tested to determine which animals are transgenic 
e.g. include the inserted exogenous DNA sequence and which are wild-type. 

Thus, the present invention also concerns a transgenic animal containing a nucleic acid, 
25 a recombinant expression vector or a recombinant host cell according to the invention. 

Recombinant Cell Lines Derived From The Transgenic Animals Of The Invention. 
A further object of the invention comprises recombinant host cells obtained from a 
transgenic animal described herein. In one embodiment the invention encompasses cells 
derived from non-human host mammals and animals comprising a recombinant vector of the 
30 invention or a gene comprising an sbgl, g34665, sbg2, g35017 or g3501 8 nucleic acid sequence 

disrupted by homologous recombination with a knock out vector. 

Recombinant cell lines may be established in vitro from cells obtained from any tissue 
of a transgenic animal according to the invention, for example by transfection of primary cell 
cultures with vectors expressing onc-genes such as SV40 large T antigen, as described by Chou 
35 (1989) and Shay et al.(1991). 
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Assays For Identification Of Compounds For Treatment Of Schizophrenia And 

BipoSar Disorder 

The present invention provides assays which may be used to test compounds for their 
ability to treat CNS disorders, and in particular, to ameliorate symptoms of a CNS disorder 
mediated by sbgl, g34665, sb g 2, g35017 or g35018. In preferred embodiments, compounds 
tested for their ability to ameliorate syptoms of schizophrenia or bipolar disorder mediated by 
sbgl , g34665, sbg2, g3501 7 or g3501 8. Compounds may also be tested for their ability to treat 
related disorders, including among others psychotic disorders, mood disorders, autism, substance 
dependence and alcoholism, mental retardation, and other psychiatric diseases including 
cognitive, anxiety, eating, impulse-control, and personality disorders, as defined with the 
Diagnosis and Statistical Manual of Mental Disorders fourth edition (DSM-IV) classification. 

The present invention also provides cell and animal, including primate and mouse, models 
of schizophrenia, bipolar disorder and related disorders. 

In one aspect, provided are non-cell based, cell based and animal based assays for the 
identification of such compounds that affect sbgl activity. Sbgl activity may be affected by 
any mechanism; in certain embodiments, sbgl activity is affected by modulating sbgl gene 
expression or the activity of the sbgl gene product. 

The present methods allow the identification of compounds that affect sbgl activity 
directly or indirectly. Thus, the non-cell based, cell based and animal assays of the present 
invention may also be used to identify compounds that act on an element of a sbgl pathway 
other than sbgl itself. These compounds can then be used as a therapeutic treatment to 
modulate sbgl and other gene products involved in schizophrenia, bipolar disorder and related 
disorders. 

Cell and non-cell based assays 

In one aspect, cell based assays using recombinant or non-recombinant cells may be 
used to identify compounds which modulate sbgl activity. 

In one aspect, a cell based assay of the invention encompasses a method for identifying 
a test compound for the treatment of schizophrenia or bipolar disorder comprising (a) exposing 
a cell to a test compound at a concentration and time sufficient to ameliorate an endpoint related 
to schizophrenia or bipolar disorder, and (b) determining the level of sbgl activity in a cell. 
Sbg 1 activity can be measured, for example, by assaying a cell for mRNA transcript level, sbgl 
peptide expression, localization or protein activity. Preferably the test compound is a 
compound capable of or suspected to be capable of ameliorating a symptom of schizophrenia 
bipolar disorder or a related disorder. Test compounds capable of modulating sbgl activity may 
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be selected for use in developing medicaments. Such cell based assays are further described 
herein in the section titled "Method For Screening Ligands That Modulate The Expression Of 
The sbgl, g34665, sbg2, g35017 and g35018 Gene." 

In another aspect, a cell based assay of the invention encompasses a method for 
identifying a compound for the treatment of schizophrenia or bipolar disorder comprising (a) « 
exposing a cell to a level of sbgl activity sufficient to cause a schizophrenia-related or bipolar 
disorder-reiated endpoint, and (b) exposing said cell to a test compound. A test compound can 
then be selected according to its ability to ameliorate said schizophrenia-related or bipolar 
disorder-related endpoints. sbg 1 activity may be provided by any suitable method, including 
but not limited to providing a vector containing an sbgl nucleotide sequence, treating said cell 
with a compound capable of increasing sbgl expression and treating said cell with an sbgl 
peptide. Preferably said cell is treated with an sbgl peptide comprising a contiguous span of at 
least 4 amino acids of SEQ ID Nos. 27 to 35; most preferably said sbgl peptide comprises 
amino acid positions 124 to 1 53 of SEQ ID No 34, as described in Example '7, Preferably the 
test compound is a compound capable of or suspected to be capable of amel ibrati ng a symtpom 
of schizophrenia, bipolar disorder or a related disorder; alternatively, the test compound is 
suspected of exacerbating an endpoint schizophrenia, bipolar disorder or a related disorder; A ' 
test compound capable of ameliorating any detectable symptom or endpoihtfof a Schizophrenia, 
bipolar disorder or a related disorder may be selected for use in developing medicaments. 

In another embodiment, the invention provides cell and non-cell ba^d assays to sbgl to 
determine whether sbg peptides bind to the cell surface, and to identify compounds for the 
treatment of schizophrenia, bipolar disorder and related disorders that interact within sbgl 
receptor. In one such embodiment, an sbgl polynucleotide, or fragments thereof, is cloned into 
expression vectors such as those described herein. The proteins are purified by size* charged : 
immunochromatography or other techniques familiar to those skilled in the art. Following 
purification, the proteins are labeled using techniques known to those skilled in the art. The 
labeled proteins are incubated with cells or cell lines derived from a variety of drgahs or tissues 
to allow the proteins to bind to any receptor present on the cell surface. Following the 
incubation, the cells are washed to remove non-specifically bound protein. The labeled proteins 
are detected by autoradiography. Alternatively, unlabeled proteins may be incubated with the 
cells and detected with antibodies having a detectable label, such as a fluorescent molecule^ 
attached thereto. Specificity of cell surface binding may be analyzed by conducting a 
competition analysis in which various amounts of unlabeled protein are incubated along with 
the labeled protein. The amount of labeled protein bound to the cell surface decreases as the 
amount of competitive unlabeled protein increases. As a control, various amounts of an 
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unlabeled protein unrelated to the labeled protein is included in some binding reactions. The 
amount of labeled protein bound to the cell surface does not decrease in binding reactions 
containing increasing amounts of unrelated unlabeled protein, indicating that the protein 
encoded by the nucleic acid binds specifically to the cell surface. 

In another embodiment, the present invention comprises non-cell based binding assays, 
wherein an sbgl polypeptide is prepared and purified as in cell based binding assays described ' 
above. Following purification, the proteins are labeled and incubated with a cell membrane 
extract or isolate derived from any desired cells from any organs, tissue or combination of 
organs or tissues of interest to allow the sbgl polypeptide to bind to any receptor present on a 
membrane. Following the incubation, the membranes are washed to remove non-specifically 
boundprotein. The labeled proteins may be detected by autoradiography. Specificity of 
membrane binding of sbgl may be analyzed by conducting a competition analysis in which 
various amounts of a test compound are incubated along with the labeled protein. Any desired 
test compound, including test polypeptides, can be incubated with the cells. The test 
compounds may be detected with antibodies having a detectable label, such as a fluorescent 
molecule, attached thereto. The amount of labeled sbgl polypeptide bound to the cell surface 
decreases as the amount of competitive test compound increases. As a control, various amounts 
of an unlabeled protein or a compound unrelated to the test compound is included in some 
binding reactions. Test compounds capable of reducing the amount of sbgl bound to cell 
membranes may be selected as a candidate therapeutic compound. 

In preferred embodiments of the cell and non-cell based assays, said sbgl peptide 
comprising a contiguous span of at least 4 amino acids of SEQ ID Nos. 27 to 35; most 
preferably said sbgl peptide comprises amino acid positions 124 to 153 of SEQ ID No 34. 

Said cell based assays may comprise cells of any suitable origin; particularly preferred 
cells are human cells, primate cells, non-human primate cells and mouse cells. If non-human 
primate cells are used, the sbgl may comprise a nucleotide sequence or be encoded by a 
nucleotide sequence according to the primate nucleic acid sequences of SEQ ID No. 54 to 1 1 1, 
or a sequence complementary thereto or a fragment thereof. 
Animal model based assay 

Non-human animal based assays may also be used to identify compounds which 
modulate sbgl activity. The invention encompasses animal models and animal based assays 
suitable, including non-transgenic or transgenic animals, including animals containing a human 
or altered form of the sbgl gene. 

Thus, the present invention comprises treating an animal affected by schizophrenia or 
bipolar disorder or symptoms thereof with a test compound capable of directly or indirectly 
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modulating sbgl activity. 

In one aspect, an animal based assay of the invention encompasses a method for 
identifying a test compound for the treatment of schizophrenia or bipolar disorder comprising 
(a) exposing an animal to a test compound at a concentration and time sufficient to ameliorate 
5 an endpoint related to schizophrenia or bipolar disorder, and (b) determining the level of sbgl 

activity at a site in said animal. Sbgl activity can be measured in any suitable cell, tissue or 
site. Preferably the test compound is a compound capable of or suspected to be capable of 
ameliorating a symptom of schizophrenia, bipolar disorder or a related disorder. Optionally 
said test compound is capable or suspected to be capable of modulating sbgl activity. Test 
1 0 compounds capable of modulating sbgl activity may be selected for use in developing 

medicaments. 

In another aspect, a animal based assay of the invention encompasses a method for 
identifying a compound for the treatment of schizophrenia or bipolar disorder comprising (a) 
exposing an animal to a level of sbgl activity sufficient to cause a schizophrenia-related or 

1 5 bipolar disorder-related symptom or endpoint, and (b) exposing said animal to a test compound. 

A test compound can then be selected according to its ability to ameliorate said schizophrenia- 
related or bipolar disorder-related endpoints. sbgl activity may be provided by any suitable 
method, including but not limited to providing a vector containing an sbgl nucleotide sequence, 
treating said animal with a compound capable of increasing sbgl expression and treating said 

20 cell with an sbgl peptide. Preferably, said animal is treated with an sbgl peptide comprising a 

contiguous span of at least 4 amino acids of SEQ ID Nos. 27 to 35; most preferably said sbgl 
peptide comprises amino acid positions 124 to 1 53 of SEQ ID No 34, as described in Example 
7. Preferably the test compound is a compound capable of or suspected to be capable of 
ameliorating a symptom of schizophrenia, bipolar disorder or a related disorder; alternatively, 

25 the test compound is suspected of exacerbating a symptom of schizophrenia, bipolar disorder or 

a related disorder. A test compound capable of ameliorating any detectable symptom or 
endpoint of a schizophrenia, bipolar disorder or a related disorder may be selected for use in 
developing medicaments. 

Any suitable animal may be used. Preferably, said animal is a primate, a non-human 

30 primate, a mammal, or a mouse. 

In one embodiment, a mouse is treated with an sbgl peptide, exposed to a test 
compound, and symptoms indicative of schizophrenia, bipolar disorder or a related disorder are 
assessed by observing stereotypy. In other embodiments, said symptoms are assessed by 
performing at least one test from the group consisting of home cage observation, neurological 

35 evaluation, stress-induced hypothermia, forced swim, PTZ seizure, locomotor activity, tail 
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suspension, elevated plus maze, novel object recognition, prepulse inhibition, thermal pain, Y- 
maze, and metabolic chamber tests (Psychoscreen™ tests available from Psychogenics Inc.). 
Other tests are known in Crawley et al, Horm. Behav. 3 1(3): 197-21 1 (1997); Crawley, Brain 
Res 835(1): 18-26 (1999) for example. 

In one example, the present inventors have tested sbgl peptides by injection into mice. 
An sbgl peptide comprising amino acid positions 124 to 153 of SEQ ID No 34 was injected 
peritoneally into adult mice as described herein in Example 7. Upon observation, mice injected 
with the sbgl peptide exhibited a decrease in the frequency of their movements over the time 
course of the experiment. Figure 18 demonstrates (left top panel of the figure) a comparison of 
the average number of movements in 3 separate time points (5, 10, and 1 5 min) with the 
average movements per min in the last period of observations (30, 35, 40, and 45 min). The 
sbgl peptide also increased stereotypy - this effect was most prominent during the last period of 
observations. Because the onset of stereotypy was variable, data are presented as the average of 
stereotypy for observations over the entire time period. 

The present inventors have also determined that the sbgl gene exists in several non- 
human primates. In a preferred embodiment of the animal models and drug screening assays of 
the invention, a non-human primate is treated with an sbgl peptide and exposed to a test 
compound, wherein said sbgl peptide is encoded by a nucleotide sequence according to the 
primate nucleic acid sequences of SEQ ID No. 54 to 1 1 1, or a sequence complementary thereto 
or a fragment thereof. 

Any suitable test compound may be used with the screening methods of the invention. 
Examples of compounds that may be screened by the methods of the present invention include 
small organic or inorganic molecules, nucleic acids, including polynucleotides from random and 
directed polynucleotide libraries, peptides, including peptides derived from random and directed 
peptide libraries, soluble peptides, ftision peptides, and phosphopeptides, antibodies including 
polyclonal, monoclonal, chimeric, humanized, and anti-idiotypic antibodies, and single chain 
antibodies, FAb, F(ab-) 2 and FAb expression library fragments, and epitope-binding fragments 
thereof. In certain aspects, a compound capable of ameliorating or exacerbating a symptom or 
endpoint of schizophrenia, bipolar disorder or a related disorder may include, by way of 
example, antipsychotic drugs in general, neuroleptics, atypical neuroleptics, antidepressants, 
anti-anxiety drugs, noradrenergic agonists and antagonists, dopaminergic agonists and 
antagonists, serotonin reuptake inhibitors, benzodiazepines. 
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Methods for screening substances interacting with an sbgl, g34665, sbg2, g3S017 
or g35018 polypeptides 

For the purpose of the present invention, a ligand means a molecule, such as a protein, a 
peptide, an antibody or any synthetic chemical compound capable of binding to the sbgl, 
g34665, sbg2, g3501 7 or g3501 8 protein or one of its fragments or variants or to modulate the 
expression of the polynucleotide coding for the sbgl, g34665, sbg2, g3501 7 or g35018 or a 
fragment or variant thereof. 

In the ligand screening method according to the present invention, a biological sample 
or a defined molecule to be tested as a putative ligand of the sbgl, g34665, sbg2, g35017 or 
g350 18 protein is brought into contact with the corresponding purified sbgl, g34665, sbg2, 
g3501 7 or g3501 8 protein, for example the corresponding purified recombinant sbgl , g34665, 
sbg2, g3501 7 or g35018 protein produced by a recombinant cell host as described hereinbefore, 
in order to form a complex between this protein and the putative ligand molecule to be tested. 

As an illustrative example, to study the interaction of the sbgl, g34665, sbg2, g35017 
and g3501 8 protein, or a fragment comprising a contiguous span of at least 4 amino acids, 
preferably at least 6, or preferably at least 8 to 10 amino acids, more preferably at least 12, 15, 
20, 25, 30, 40, 50, or 100 amino acids of SEQ ID.Nos 27 to 35 and 41 to 43, with drugs or small 
molecules, such as molecules generated through combinatorial chemistry approaches, the 
microdialysis coupled to HPLC method described by Wang et al. (1997) or the affinity capillary 
electrophoresis method described by Bush et al. (1997);, can be used. 

In further methods, peptides, drugs, fatty acids, lipoproteins, or small molecules which 
interact with the sbgl, g34665, sbg2, g35017 or g35018 protein, or a fragment comprising a 
contiguous span of at least 4 amino acids, preferably at least 6, or preferably at least 8 to 10 
amino acids, more preferably at least 12, 15, 20, 25, 30, 40, 50, or 100 amino acids of SEQ ID 
Nos 27 to 35 and 41 to 43, may be identified using assays such as the following. The molecule 
to be tested for binding is labeled with a detectable label, such as a fluorescent , radioactive, or 
enzymatic tag and placed in contact with immobilized sbgl, g34665, sbg2, g35017 or g35018 
protein, or a fragment thereof under conditions which permit specific binding to occur. After 
removal of non-specifically bound molecules, bound molecules are detected using appropriate 
means. 

Another object of the present invention comprises methods and kits for the screening of 
candidate substances that interact with an sbgl, g34665, sbg2, g35017 or g35018 polypeptide. 

The present invention pertains to methods for screening substances of interest that 
interact with an sbgl, g34665, sbg2, g35017 or g3501 8 protein or one fragment or variant 
thereof. By their capacity to bind covalently or non-covalently to an sbgl, g34665, sbg2, 
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g35017 or g 35018 protein or to a fragment or variant thereof, these substances or molecules 
may be advantageously used both in vitro and in vivo. 

In vitro, said interacting molecules may be used as detection means in order to identify 
the presence of an sbgl, g 34665, sbg2, g35017 or g 350l8 protein in a sample, preferably a 
biological sample. 

A method for the screening of a candidate substance comprises the following steps : 

a) providing a polypeptide comprising, consisting essentially of, or consisting of an 
sbgl , g34665, sbg2, g3501 7 or g3501 8 protein or a fragment comprising a contiguous span of at 
least 4 amino acids, preferably at least 6 amino acids, more preferably at least 8 to 10 amino 
acids, more preferably at least 12, 15, 20, 25, 30, 40, 50, or 100 amino acids of SEQ ID Nos. 27 
to 35 and 41 to 43; 

b) obtaining a candidate substance; 

c) bringing into contact said polypeptide with said candidate substance; and 

d) detecting the complexes formed between said polypeptide and said candidate 
substance. 

The invention further concerns a kit for the screening of a candidate substance 
interacting with the sbgl, g34665, sbg2, g35017 or g35018 polypeptide, wherein said kit 
comprises: 

a) an sbgl, g34665, sbg2, g35017 or §35018 protein having an amino acid sequence 
selected from the group consisting of the amino acid sequences of SEQ ID Nos. 27 to 35 and 41 
to 43 or a peptide fragment comprising a contiguous span of at least 4 amino acids, preferably at 
least 6 amino acids, more preferably at least 8 to 10 amino acids, and more preferably at least 
12, 15, 20, 25, 30, 40, 50, or 100 amino acids of SEQ ID Nos. 27 to 35 and 41 to 43; and 

b) optionally means useful to detect the complex formed between the sbgl , g34665, 
sbg2, g350 1 7 or g350 1 8 protein or a peptide fragment or a variant thereof and the candidate 
substance. 

In a preferred embodiment of the kit described above, the detection means comprise 
monoclonal or polyclonal antibodies directed against the sbgl, g34665, sbg2, g35017 or g35018 
protein or a peptide fragment or a variant thereof. 

Various candidate substances or molecules can be assayed for interaction with an sbgl, 
g34665, sbg2, g35017 or g3501 8 polypeptide. These substances or molecules include, without' 
being limited to, natural or synthetic organic compounds or molecules of biological origin such 
as polypeptides. When the candidate substance or molecule comprise a polypeptide, this 
polypeptide may be the resulting expression product of a phage clone belonging to a' phage- 
based random peptide library, or alternatively the polypeptide may be the resulting expression 
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product of a cDNA library cloned in a vector suitable for performing a two-hybrid screening 
assay. 

The invention also pertains to kits useful for performing the hereinbefore described 
screening method. Preferably, such kits comprise an sbgl, g34665, sbg2, g35017 or g35018 
5 polypeptide or a fragment or a variant thereof, and optionally means useful to detect the 

complex formed between the sbgl, g34665, sbg2, g35017 or g3501 8 polypeptide or its fragment 
or variant and the candidate substance. In a preferred embodiment the detection means 
comprise monoclonal or polyclonal antibodies directed against the corresponding sbgl, g34665, 
sbg2, g3501 7 or g3501 8 polypeptide or a fragment or a variant thereof. 

10 A. Candidate ligands obtained from random peptide libraries 

In a particular embodiment of the screening method, the putative ligand is the 
expression product of a DNA insert contained in a phage vector (Parmley and Smith, 1988). 
Specifically, random peptide phages libraries are used. The random DNA inserts encode for 
peptides of 8 to 20 amino acids in length (Oldenburg K.R. et al., 1992; Valadon P., et al., 1996; 

15 Lucas A.H., 1994; Westerink M.A.J., 1995; Felici F. et al., 1991). According to this particular 

embodiment, the recombinant phages expressing a protein that binds to the immobilized sbgl, 
g34665, sbg2, g35017 or g35018 protein is retained and the complex formed between the sbgl, 
g34665, sbg2, g35017 org35018 protein and the recombinant phage may be subsequently 
immunoprecipitated by a polyclonal or a monoclonal antibody directed against the sbgl , 

20 g34665, sbg2, g35017 or g350 18 protein. 

Once the ligand library in recombinant phages has been constructed, the phage 
population is brought into contact with the immobilized sbgl, g34665, sbg2, g35017 or g35018 
protein. Then the preparation of complexes is washed in order to remove the non-specifically 
bound recombinant phages. The phages that bind specifically to the sbgl , g34665, sbg2, 

25 g35017 or g35018 protein are then eluted by a buffer (acid pH) or immunoprecipitated by the 

monoclonal antibody produced by the hybridoma anti- sbgl , g34665, sbg2, g3 50 17 or g35018, 
and this phage population is subsequently amplified by an over-infection of bacteria (for 
example E. coli). The selection step may be repeated several times, preferably 2-4 times, in 
order to select the more specific recombinant phage clones. The last step comprises 

30 characterizing the peptide produced by the selected recombinant phage clones either by 

expression in infected bacteria and isolation, expressing the phage insert in another host-vector 
system, or sequencing the insert contained in the selected recombinant phages. 

B. Candidate ligands obtained by competition experiments. 

Alternatively, peptides, drugs or small molecules which bind to the sbgl, g34665, sbg2, 
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83501 7 or S 3501 8 P rotein ' or a fragment comprising a contiguous span of at least 4 amino 
acids, preferably at least 6 amino acids, more preferably at least 8 to 10 amino acids, and more 
preferably at least 12, 1 5, 20, 25, 30, 40, 50, or 100 amino acids of SEQ ID Nos. 27 to 35 and 
41 to 43, may be identified in competition experiments. In such assays, the sbgl, g34665, sbg2, 
g3501 7 or g3501 8 protein, or a fragment thereof, is immobilized to a surface, such as a plastic ' 
plate. Increasing amounts of the peptides, drugs or small molecules are placed in contact with 
the immobilized sbgl, g34665, sbg2, g3501 7 or g3501 8 protein, or a fragment thereof, in the 
presence of a detectable labeled known sbgl , g34665, sbg2, g3501 7 or g3501 8 protein ligand. 
For example, the sbgl, g 34665, sbg2, g35017 or g350I8 ligand may be detectably labeled with 
a fluorescent, radioactive, or enzymatic tag. The ability of the test molecule to bind the sbgl , 
g34665, sbg2, g350 1 7 or g350 1 8 protein, or a fragment thereof, is determined by measuring the 
amount of detectably labeled known ligand bound in the presence of the test molecule. A 
decrease in the amount of known ligand bound to the sbgl, g34665, sbg2, g35017 or g3501 8 
protein, or a fragment thereof, when the test molecule is present indicated that the test molecule 
is able to bind to the sbgl , g34665, sbg2, g3501 7 or g3501 8 protein, or a fragment thereof. 

C. Candidate Iigands obtained by affinity chromatography. 

Proteins or other molecules interacting with the sbgl , g34665, sbg2, g3501 7 or g35018 
protein, or a fragment comprising a contiguous span of at 4 amino acids, preferably at least 6 
amino acids, more preferably at least 8 to 10 amino acids, and more preferably at least 12, 15, 
20, 25, 30, 40, 50, or 100 amino acids of SEQ ID Nos 27 to 35 and 41 to 43, can also be found 
using affinity columns which contain the sbgl, g34665, sbg2, g35017 or g35018 protein, or a 
fragment thereof. The sbgl, g34665, sbg2, g35017 or g3501 8 protein, or a fragment thereof, 
may be attached to the column using conventional techniques including chemical coupling to a 
suitable column matrix such as agarose, Affi Gel® , or other matrices familiar to those of skill 
in art. In some embodiments of this method, the affinity column contains chimeric proteins in 
which the sbgl, g34665, sbg2, g35017 or g35018 protein, or a fragment thereof, is fused to 
glutathion S transferase (GST). A mixture of cellular proteins or pool of expressed proteins as 
described above is applied to the affinity column. Proteins or other molecules interacting with 
the sbgl, g34665, sbg2, g35017 or g35018 protein, or a fragment thereof, attached to the 
column can then be isolated and analyzed on 2-D electrophoresis gel as described in Ramunsen 
et al. (1997). Alternatively, the proteins retained on the affinity column can be purified by 
electrophoresis based methods and sequenced. The same method can be used to isolate 
antibodies, to screen phage display products, or to screen phage display human antibodies. 
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D. Candidate Bigarads obtained by optical biosensor methods 
Proteins interacting with the sbgl, g34665, sbg2, g350l 7 or g350l 8 protein, or a 
fragment comprising a contiguous span of at least 4 amino acids, preferably at least 6 amino 
acids, more preferably at least 8 to 1 0 amino acids, and more preferably at least 12, 15, 20, 25, 
30, 40, 50, or 100 amino acids of SEQ ID Nos. 27 to 35 and 41 to 43, can also be screened by 
using an Optical Biosensor as described in Edwards and Leatherbarrow (1997) and also in 
Szabo et al. (1995). This technique permits the detection of interactions between molecules in 
real time, without the need of labeled molecules. This technique is based on the surface 
plasmon resonance (SPR) phenomenon. Briefly, the candidate ligand molecule to be tested is 
attached to a surface (such as a carboxymethyl dextran matrix). A light beam is directed 
towards the side of the surface that does not contain the sample to be tested and is reflected by 
said surface. The SPR phenomenon causes a decrease in the intensity of the reflected light with 
a specific association of angle and wavelength. The binding of candidate ligand molecules 
cause a change in the refraction index on the surface, which change is detected as a change in 
1 5 the SPR signal. For screening of candidate ligand molecules or substances that are able to 

interact with the sbgl, g34665, sbg2, g35017 or g35018 protein, or a fragment thereof, the sbgl, 
g34665, sbg2, g35017 or g35018 protein, or a fragment thereof, is immobilized onto a surface. 
This surface comprises one side of a cell through which flows the candidate molecule to be 
assayed. The binding of the candidate molecule on the sbgl, g34665, sbg2, g35017 or g3501 8 
20 protein, or a fragment thereof, is detected as a change of the SPR signal. The candidate 

molecules tested may be proteins, peptides, carbohydrates, lipids, or small molecules generated 
by combinatorial chemistry. This technique may also be performed by immobilizing eukaryotic 
or prokaryotic cells or lipid vesicles exhibiting an endogenous or a recombinantly expressed 
sbgl, g34665, sbg2, g35017 or g35018 protein at their surface. 
25 The main advantage of the method is that it allows the determination of the association 

rate between the sbgl, g34665, sbg2, g35017 or g3501 8 protein and molecules interacting with 
the sbgl, g34665, sbg2, g35017 or g35018 protein. It is thus possible to select specifically 
ligand molecules interacting with the sbgl, g34665, sbg2, g35017 or g35018 protein, or a 
fragment thereof, through strong or conversely weak association constants. 

30 E. Candidate ligaiads obtained through a two-hybrid screening assay. 

The yeast two-hybrid system is designed to study protein-protein interactions in vivo 
(Fields and Song, 1989), and relies upon the fusion of a bait protein to the DNA binding domain 
of the yeast Gal4 protein. This technique is also described in the US Patent N° US 5,667,973 
and the US Patent N° 5,283,173 (Fields et al.). 
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The general procedure of library screening by the two-hybrid assay may be performed 
as described by Harper et al. (1 993) or as described by Cho et al. (1 998) or also Fromont-Racine 
et al.(l997). 

The bait protein or polypeptide comprises, consists essentially of, or consists of an 
sbgl, g 34665, sbg2, g3501 7 or g3501 8 polypeptide or a fragment comprising a contiguous span 
of at least 4 amino acids, preferably at least 6 amino acids, more preferably at least 8 to 10 
amino acids, and more preferably at least 12, 1 5, 20, 25, 30, 40, 50, or 100 amino acids of SEQ 
ID Nos. 27 to 35 and 4 1 to 43. 

More precisely, the nucleotide sequence encoding the sbgl, g34665, sbg2, g35017 or 
g35018 polypeptide or a fragment or variant thereof is fused to a polynucleotide encoding the 
DNA binding domain of the GAL4 protein, the fused nucleotide sequence being inserted in a 
suitable expression vector, for example pAS2 or pM3. 

Then, a human cDNA library is constructed in a specially designed vector, such that the 
human cDNA insert is fused to a nucleotide sequence in the vector that encodes the 
transcriptional domain of the GAL4 protein. Preferably, the vector used is the pACT vector. 
The polypeptides encoded by the nucleotide inserts of the human cDNA library are termed 
"pray" polypeptides. 

A third vector contains a detectable marker gene, such as beta galactosidase gene or 
CAT gene that is placed under the control of a regulation sequence that is responsive to the 
binding of a complete Gal4 protein containing both the transcriptional activation domain and 
the DNA binding domain. For example, the vector pG5EC may be used. 

Two different yeast strains are also used. As an illustrative but non limiting example 
the two different yeast strains may be the followings : 

- Y190, the phenotype of which is (MATa, Leu2-3, 112 ura3-12, trpl-901, his3-D200, ade2- 
1 01, gal4Dgall80D URA3 GAL-LacZ, LYS GAL-H1S3, cyU); 

- Y187, the phenotype of which is (MATa gal4 galSO his3 trpl-901 ade2-101 ura3-52 leu2-3, 
-112 URA3 GAL-lacZmef), which is the opposite mating type of YI90. 

Briefly, 20 ug of pAS2/ sbgl, g 34665, sbg2, g3501 7 or g35018 and 20 ug of pACT- 
cDNA library are co-transformed into yeast strain Y190. The transformants are selected for 
growth on minimal media lacking histidine, leucine and tryptophan, but containing the histidine 
synthesis inhibitor 3-AT (50 mM). Positive colonies are screened for beta galactosidase by 
filter lift assay. The double positive colonies (His*, beta-gat) are then grown on plates lacking 
histidine, leucine, but containing tryptophan and cycloheximide (10 mg/ml) to select for loss of 
pAS2/sbgl, g34665, sbg2, g35017 and g3501 8 plasmids bu retention of pACT-cDNA library 
plasmids. The resulting Y190 strains are mated with Y187 strains expressing sbgl, g 34665, 
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st >g2, g350I7 and g35018 or non-reiated control proteins; such as cyclophilin B, lamin, or 
SNF1, as Gal4 fusions as described by Harper et al. (1993) and by Bram et al. (Bram RJ et al., 
1993), and screened for beta galactosidase by filter lift assay. Yeast clones that are beta gal- 
after mating with the control Gal4 fusions are considered false positives. 

In another embodiment of the two-hybrid method according to the invention, interaction 
between the sbgl, g34665, sbg2, g35017 or g3501 8 or a fragment or variant thereof with 
cellular proteins may be assessed using the Matchmaker Two Hybrid System 2 (Catalog No. 
Kl 604-1, Clontech). As described in the manual accompanying the Matchmaker Two Hybrid 
System 2 (Catalog No. K 1604-1, Clontech), nucleic acids encoding the sbgl, g34665, sbg2, 
g350 1 7 and g350 1 8 protein or a portion thereof, are inserted into an expression vector such that 
they are in frame with DNA encoding the DNA binding domain of the yeast transcriptional 
activator GAM. A desired cDNA, preferably human cDNA, is inserted into a second expression 
vector such that they are in frame with DNA encoding the activation domain of GALA The two 
expression plasmids are transformed into yeast and the yeast are plated on selection medium which 
selects for expression of selectable markers on each of the expression vectors as well as GAL4 
dependent expression of the HIS3 gene. Transformants capable of growing on medium lacking 
histidine are screened for GAM dependent lacZ expression. Those cells which are positive in both 
the histidine selection and the lacZ assay contain interaction between sbgl, g34665, sbg2, g3501 7 
or g3501 8 and the protein or peptide encoded by the initially selected cDNA insert. 

Method For Screening Substances Interacting With The Regulatory Sequences Of 
An sbgl, g34665, sbg2, g35017 or g35018 Gene. 

The present invention also concerns a method for screening substances or molecules 
that are able to interact with the regulatory sequences of the sbgl, g34665, sbg2, g35017 or 
g35018 gene, such as for example promoter or enhancer sequences. 

Nucleic acids encoding proteins which are able to interact with the regulatory sequences 
of the sbgl, g34665, sbg2, g35017 or g35018 gene, more particularly a nucleotide sequence 
selected from the group consisting of the polynucleotides of the 5* and 3' regulatory region or a 
fragment or variant thereof, and preferably a variant comprising one of the biallelic markers of 
the invention, may be identified by using a one-hybrid system, such as that described in the 
booklet enclosed in the Matchmaker One-Hybrid System kit from Clontech (Catalog Ref. n° 
Kl 603-1). Briefly, the target nucleotide sequence is cloned upstream of a selectable reporter 
sequence and the resulting DNA construct is integrated in the yeast genome (Saccharomyces 
cerevisiae). The yeast cells containing the reporter sequence in their genome are then 
transformed with a library comprising fusion molecules between cDNAs encoding candidate 
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proteins for binding onto the regulatory sequences of the sbgl, g 34665, sbg2, g35017 or g35018 
gene and sequences encoding the activator domain of a yeast transcription factor such as GAL4 
The recombinant yeast cells are plated in a culture broth for selecting cells expressing the 
reporter sequence. The recombinant yeast cells thus selected contain a fusion protein that is 
able to bind onto the target regulatory sequence of the sbgl, g 3466S, sbg2, g35017 or g3501 8 
gene. Then, the cDNAs encoding the fusion proteins are sequenced and may be cloned into 
expression or transcription vectors /„ v/,ro. The binding of the encoded polypeptides to the 
target regulatory sequences of the sbgl, g34665, sb g 2, g35017 or g35018 gene may be 
confirmed by techniques familiar to the one skilled in the art, such as gel retardation assays or 
DNAse protection assays. 

Gel retardation assays may also be performed independently in order to screen 
candidate molecules that are able to interact with the regulatory sequences of the sbgl, g34665 
sbg2, g3501 7 or g3501 8 gene, such as described by Fried and Crothers (1 98 1), Garner and 
Revzin (1981) and Dent and Latchman (1993). These techniques are based on the principle 
according to which a DNA fragment which is bound to a protein migrates slower than the same 
unbound DNA fragment. Briefly, the target nucleotide sequence is labeled. Then the labeled 
target nucleotide sequence is brought into contact with either a total nuclear extract from cells 
containing transcription factors, or with different candidate molecules to be tested. The 
interaction between the target regulatory sequence of the sbgl, g34665, sbg2, g35017 or g3501 8 
gene and the candidate molecule or the transcription factor is detected after gel or capillary 
electrophoresis through a retardation in the migration. 

Method For Screening Ligands That Modulate The Expression Of The sbgl, 
g34665, sbg2, g35017 or g3S018 Gene 

Another subject of the present invention is a method for screening molecules that 
modulate the expression of the sbgl, g34665, sbg2, g35017 or g35018 protein. Such a 
screening method comprises the steps of: 

a) cultivating a prokaryotic or an eukaryotic cell that has been transfected with a 
nucleotide sequence encoding the sbgl , g 34665, sbg2, g350 1 7 or g3501 8 protein or a variant or 
a fragment thereof, placed under the control of its own promoter; 

b) bringing into contact the cultivated cell with a molecule to be tested; 

c) quantifying the expression of the sbgl , g 3 4665, sbg2, g3501 7 or g3501 8 protein or a 
variant or a fragment thereof. 

In an embodiment, the nucleotide sequence encoding the sbgl, g 34665, sbg2, g35017 or 
g3501 8 protein or a variant or a fragment thereof comprises an allele of at least one sbgl, 
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g34665, sbg2, g35017 or g35018 related biallelic marker. 

Using DNA recombination techniques well known by the one skill in the art, the sbgl, 
g34665, sbg2, g35017 or g35018 protein encoding DNA sequence is inserted into an expression 
vector, downstream from its promoter sequence. As an illustrative example, the promoter 
sequence of the sbgl, g34665, sbg2, g35017 or g35018 gene is contained in the nucleic acid of 
the 5' regulatory region. 

The quantification of the expression of the sbgl, g34665, sbg2, g350i 7 or g35ui8 
protein may be realized either at the mRNA level or at the protein level. In the latter case, 
polyclonal or monoclonal antibodies may be used to quantify the amounts of the sbgl, g34665, 
sbg2, g3501 7 or g3501 8 protein that have been produced, for example in an ELISA or a RIA 
assay. 

In a preferred embodiment, the quantification of the sbgl, g34665, sbg2, g35017 or 
g3501 8 mRNA is realized by a quantitative PCR amplification of the cDNA obtained by a 
reverse transcription of the total mRNA of the cultivated sbgl, g34665, sbg2, g35017 or g35018 
-transfected host cell, using a pair of primers specific for sbgl, g34665, sbg2, g35017 or 
g35018. 

The present invention also concerns a method for screening substances or molecules 
that are able to increase, or in contrast to decrease, the level of expression of the sbgl, g34665, 
sbg2, g35017 or g3501 8 gene. Such a method may allow the one skilled in the art to select 
substances exerting a regulating effect on the expression level of the sbgl , g34665, sbg2, 
g35017 or g3501 8 gene and which may be useful as active ingredients included in 
pharmaceutical compositions for treating patients suffering from diseases. 

Thus, is also part of the present invention a method for screening of a candidate 
substance or molecule that modulated the expression of the sbgl, g34665, sbg2, g35017 or 
g35018 gene, this method comprises the following steps: 

- providing a recombinant cell host containing a nucleic acid, wherein said nucleic acid 
comprises a nucleotide sequence of the 5' regulatory region or a biologically active fragment or 
variant thereof located upstream a polynucleotide encoding a detectable protein; 

- obtaining a candidate substance; and 

- determining the ability of the candidate substance to modulate the expression levels of 
the polynucleotide encoding the detectable protein. 

In a further embodiment, the nucleic acid comprising the nucleotide sequence of the 5' 
regulatory region or a biologically active fragment or variant thereof also includes a 5'UTR 
region of the sbgl cDN A of SEQ ID No 2 to 26 or the g3501 8 cDNA of SEQ ID No 36 to 40, 
or one of its biologically active fragments or variants thereof. 
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Among the preferred polynucleotides encoding a detectable protein, there may be cited 
polynucleotides encoding beta galactosidase, green fluorescent protein (GFP) and 
chloramphenicol acetyl transferase (CAT). 

The invention also pertains to kits useful for performing the herein described screening 
method. Preferably, such kits comprise a recombinant vector that allows the expression of a 
nucleotide sequence of the 5' regulatory region or a biologically active fragment or variant 
thereof located upstream and operably linked to a polynucleotide encoding a detectable protein 
or the sbgl, g34665, sbg2, g3501 7 or g350 1 8 protein or a fragment or a variant thereof. 

In another embodiment of a method for the screening of a candidate substance or 
molecule that modulates the expression of the sbgl, g34665, sbg2, §35017 or g350 1 8 gene, 
wherein said method comprises the following steps: 

a) providing a recombinant host cell containing a nucleic acid, wherein said nucleic acid 
comprises a 5'UTR sequence of an sbgl, g 34665, sbg2, g3501 7 or g3501 8 cDNA, preferably of 
an sbgl or g3501 8 cDNA of SEQ ID Nos 2 to 26 or 36 to 40, or one of its biologically active 
fragments or variants, the 5'UTR sequence or its biologically active fragment or variant being 
operably linked to a polynucleotide encoding a detectable protein; 

b) obtaining a candidate substance; and 

c) determining the ability of the candidate substance to modulate the expression levels 
of the polynucleotide encoding the detectable protein. 

In a specific embodiment of the above screening method, the nucleic acid that 
comprises a nucleotide sequence selected from the group consisting of the 5'UTR sequence of 
an sbgl, g34665, sbg2, g35017 or g35018 cDNA, preferably of an sbgl or g35018 cDNA of 
SEQ ID Nos 2 to 26 or 36 to 40 or one of its biologically active fragments or variants, includes 
a promoter sequence which is endogenous with respect to the sbgl, g34665, sbg2, g35017 or 
g3501 8 5'UTR sequence. 

In another specific embodiment of the above screening method, the nucleic acid that 
comprises a nucleotide sequence selected from the group consisting of the 5'UTR sequence of 
an sbgl , g34665, sbg2, g35017 or g3501 8 cDNA or one of its biologically active fragments or 
variants, includes a promoter sequence which is exogenous with respect to the sbgl , g 34665, 
sbg2, g3501 7 or g3501 8 5'UTR sequence defined therein. 

In a further preferred embodiment, the nucleic acid comprising the 5'-UTR sequence of 
an sbgl , g34665, sbg2, g350 1 7 or g350 1 8 cDNA or the biologically active fragments thereof 
includes an sbgl -related bialleJic marker. 

The invention further comprises a kit for the screening of a candidate substance 
modulating the expression of the sbgl , g34665, sb g 2, g350 1 7 or g350 1 8 gene, wherein said kit 
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comprises a recombinant vector that comprises a nucleic acid including a 5'UTR sequence of 
the sbg 1 , g34665, sbg2, g3 50 1 7 or g3 50 1 8 cDN A of SEQ ID Nos 2 to 26 or 36 to 40, or one of 
their biologically active fragments or variants, the 5'UTR sequence or its biologically active 
fragment or variant being operably linked to a polynucleotide encoding a detectable protein. 

For the design of suitable recombinant vectors useful for performing the screening 
methods described above, it will be referred to the section of the present specification wherein 
the preferred recombinant vectors of the invention are detailed. 

Expression levels and patterns of sbgl, g34665, sbg2, g35017 or g35018 may be 
analyzed by solution hybridization with long probes as described in International Patent 
Application No. WO 97/05277. Briefly, the sbgl, g34665, sbg2, g35017 or g35018 cDNA or 
the sbgl, g34665, sbg2, g35017 and g350 18 genomic DNA described above, or fragments 
thereof, is inserted at a cloning site immediately downstream of a bacteriophage (T3, T7 or SP6) 
RNA polymerase promoter to produce antisense RNA. Preferably, the sbgl, g34665, sbg2, 
g35017 and g35018 insert comprises at least 100 or more consecutive nucleotides of the 
genomic DNA sequence or the cDNA sequences. The plasmid is linearized and transcribed in 
the presence of ribonucleotides comprising modified ribonucleotides (i.e. biotin-UTP and DIG- 
UTP). An excess of this doubly labeled RNA is hybridized in solution with mRNA isolated 
from cells or tissues of interest. The hybridization is performed under standard stringent 
conditions (40-50°C for 16 hours in an 80% formamide, 0. 4 M NaCI buffer, pH 7-8). The 
unhybridized probe is removed by digestion with ribonucleases specific for single-stranded 
RNA (i.e. RNases CL3, Tl , Phy M, U2 or A). The presence of the biotin-UTP modification 
enables capture of the hybrid on a microtitration plate coated with streptavidin. The presence of 
the DIG modification enables the hybrid to be detected and quantified by ELISA using an anti- 
DIG antibody coupled to alkaline phosphatase. 

Quantitative analysis of sbgl, g34665, sbg2, g35017 or g35018 gene expression may 
also be performed using arrays. As used herein, the term array means a one dimensional, two 
dimensional, or multidimensional arrangement of a plurality of nucleic acids of sufficient length 
to permit specific detection of expression of mRNAs capable of hybridizing thereto. For 
example, the arrays may contain a plurality of nucleic acids derived from genes whose 
expression levels are to be assessed. The arrays may include the sbgl, g34665, sbg2, g35017 
and g35018 genomic DNA, the sbgl, g34665, sbg2, g35017 or g35018 cDNA sequences or the 
sequences complementary thereto or fragments thereof, particularly those comprising at least 
one of the biallelic markers according the present invention. Preferably, the fragments are at 
least 1 5 nucleotides in length. In other embodiments, the fragments are at least 25 nucleotides 
in length. In some embodiments, the fragments are at least 50 nucleotides in length. More 
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preferably, the fragments are at least 1 00 nucleotides in length. In another preferred 

embodiment, the fragments are more than 100 nucleotides in length. In some embodiments the 

fragments may be more than 500 nucleotides in length. 

For example, quantitative analysis of sbgl, g34665, sbg2, g35017 or g35018 gene 

expression may be performed with a complementary DNA microarray as described by Schena 
et .1.0995 and 1996). Full length sbgl, g 34665, sb g 2, g 35017 or g35018 cDNAs or fragments 
thereof are amplified by PCR and arrayed from a 96-well microtiter plate onto silylated 
microscope slides using high-speed robotics. Printed arrays are incubated in a humid chamber 
to allow rehydration of the array elements and rinsed, once in 0. 2% SDS for 1 min, twice in 
water for 1 min and once for 5 min in sodium borohydride solution. The arrays are submerged 
m water for 2 min at 95°C, transferred into 0. 2% SDS for 1 min, rinsed twice with water, air 
dried and stored in the dark at 25°C. 

Cell or tissue mRNA is isolated or commercially obtained and probes are prepared by a 
s.ngle round of reverse transcription. Probes are hybridized to 1 cm 2 microarrays under a 14 x 
14 mm glass coverslip for 6-12 hours at 60°C. Arrays are washed for 5 min at 25°C in low 
stringency wash buffer (1 x SSC/0. 2% SDS), then for 10 min at room temperature in high 
stringency wash buffer (0. 1 x SSC/0. 2% SDS). Arrays are scanned in 0. 1 x SSC using a 
fluorescence laser scanning device fitted with a custom filter set. Accurate differential 
expression measurements are obtained by taking the average of the ratios of two independent 
hybridizations. 

Quantitative analysis of sbgl, g 34665, sbg2, g35017 or g35018 gene expression may 
also be performed with full length sbgl, g34665, sbg2, g35017 or g35018 cDNAs or fragments 
thereof ,„ complementary DNA arrays as described by Pietu et al.(1996). The full length sbgl 
g34665, sbg2, g35017 or g35018 cDNA or fragments thereof is PCR amplified and spotted on' 
membranes. Then, mRNAs originating from various tissues or cells are labeled with 
radioactive nucleotides. After hybridization and washing in controlled conditions the 
hybridized mRNAs are detected by phospho-imaging or autoradiography. Duplicate 
experiments are performed and a quantitative analysis of differentially expressed mRNAs is 
then performed. 

Alternatively, expression analysis using the sbgl, g 34665, sbg2, g35017 or g35018 
genomic DNA, the sbgl, g34665, sbg2, g35017 or g35018 cDNA, or fragments thereof can be 
done through high density nucleotide arrays as described by Lockhart et al.(l 996) and 
Sosnowsky et al.(1997). Oligonucleotides of 1 5-50 nucleotides from the sequences of the sbgl 
g34665, sbg2, g35017 or g35018 genomic DNA, the sbgl, g 34665, sbg2, g35017 or g35018 
cDNA sequences particularly those comprising at least one of biallelic markers according the 



WO 00/58510 



173 



PCT/IB00/00435 



present invention, or the sequences complementary thereto, are synthesized directly on the chip 
(Lockhart et al., supra) or synthesized and then addressed to the chip (Sosnowski et al., supra). 
Preferably, the oligonucleotides are about 20 nucleotides in length. 

sbgl , g34665, sbg2, g3501 7 or g3501 8 cDNA probes labeled with an appropriate 
compound, such as biotin, digoxigenin or fluorescent dye, are synthesized from the appropriate 
mRNA population and then randomly fragmented to an average size of 50 to 100 nucleotides. 
The said probes are then hybridized to the chip. After washing as described in Lockhart et al., 
supra and application of different electric fields (Sosnowsky et al., 1997)., the dyes or labeling 
compounds are detected and quantified. Duplicate hybridizations are performed. Comparative 
analysis of the intensity of the signal originating from cDNA probes on the same target 
oligonucleotide in different cDNA samples indicates a differential expression of sbgl, g34665, 
sbg2, g3501 7 or g3501 8 mRNA. 

Methods For Inhibiting The Expression Of An sbgl, g34665, sbg2, g35017 or 
g35018 Gene 

Other therapeutic compositions according to the present invention comprise 
advantageously an oligonucleotide fragment of the nucleic sequence of sbgl, g34665, sbg2, 
g35017 or g3501 8 as an antisense tool or a triple helix tool that inhibits the expression of the 
corresponding sbgl, g34665, sbg2, g35017 or g35018 gene. A preferred fragment of the 
nucleic sequence of sbgl, g34665, sbg2, g35017 or g35018 comprises an allele of at least one of 
the btallelic markers of the invention. 

Antisense Approach 

Preferred methods using antisense polynucleotide according to the present invention are 
the procedures described by Sczakiel et al.(1995). 

Preferably, the antisense tools are chosen among the polynucleotides (1 5-200 bp long) 
that are complementary to the 5'end of the sbgl, g34665, sbg2, g35017 or g35018 mRNA. In 
another embodiment, a combination of different antisense polynucleotides complementary to 
different parts of the desired targeted gene are used. 

Preferred antisense polynucleotides according to the present invention are 
complementary to a sequence of the mRNAs of sbgl, g34665, sbg2, g35017 or g35018 that 
contains either the translation initiation codon ATG or a splicing donor or acceptor site. 

The antisense nucleic acids should have a length and melting temperature sufficient to 
permit formation of an intracellular duplex having sufficient stability to inhibit the expression of 
the sbgl, g34665, sbg2, g3S017 or g35018 mRNA in the duplex. Strategies for designing 
antisense nucleic acids suitable for use in gene therapy are disclosed in Green et al., (1986) and 
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Izant and Weintraub, (1984). 

In some strategies, antisense molecules are obtained by reversing the orientation of the 
sbgl , g34665, sbg2, g3501 7 or g3501 8 coding region with respect to a promoter so as to 
transcribe the opposite strand from that which is normally transcribed in the cell. The antisense 
molecules may be transcribed using in vitro transcription systems such as those which employ 
T7 or SP6 polymerase to generate the transcript. Another approach involves transcription of 
sbgl, g 34665, sbg2, g35017 or g3501 8 antisense nucleic acids in vivo by operably linking DNA 
containing the antisense sequence to a promoter in a suitable expression vector. 

Alternatively, suitable antisense strategies are those described by Rossi et al.(1991), in 
the International Applications Nos. WO 94/23026, WO 95/04141, WO 92/18522 and in the 
European Patent Application No. EP 0 572 287 A2 

An alternative to the antisense technology that is used according to the present 
invention comprises using ribozymes that will bind to a target sequence via their 
complementary polynucleotide tail and that will cleave the corresponding RNA by hydrolyzing 
its target site (namely "hammerhead ribozymes"). Briefly, the simplified cycle of a 
hammerhead ribozyme comprises (1) sequence specific binding to the target RNA via 
complementary antisense sequences; (2) site-specific hydrolysis of the cleavable motif of the 
target strand; and (3) release of cleavage products, which gives rise to another catalytic cycle. 
Indeed, the use of long-chain antisense polynucleotide (at least 30 bases long) or ribozymes 
with long antisense arms are advantageous. A preferred delivery system for antisense ribozyme 
is achieved by covalently linking these antisense ribozymes to lipophilic groups or to use 
liposomes as a convenient vector. Preferred antisense ribozymes according to the present 
invention are prepared as described by Sczakiei et al.(1995). 

Triple Helix Approach 

The sbgl, g34665, sbg2, g35017 or g35018 genomic DNA may also be used to inhibit 
the expression of the sbgl, g34665, sbg2, g35017 or g35018 gene based on intracellular triple 
helix formation. 

Triple helix oligonucleotides are used to inhibit transcription from a genome. They are 
particularly useful for studying alterations in cell activity when it is associated with a particular 
gene. 

Similarly, a portion of the sbgl, g34665, sbg2, g35017 or g35018 genomic DNA can be 
used to study the effect of inhibiting sbgl , g34665, sbg2, g3501 7 or g3501 8 transcription within 
a cell. Traditionally, homopurine sequences were considered the most useful for triple helix 
strategies. However, homopyrimidine sequences can also inhibit gene expression. Such 
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homopyrimidine oligonucleotides bind to the major groove at homopurine:homopyrimidine 
sequences. Thus, both types of sequences from the sbgl, g34665, sbg2, g35017 or g35018 
genomic DNA are contemplated within the scope of this invention. 

To carry out gene therapy strategies using the triple helix approach, the sequences of 
the sbgl, g34665, sbg2, g35017 org35018 genomic DNA are first scanned to identify 10-merto 
20-mer homopyrimidine or homopurine stretches which could be used in triple-helix based 
strategies for inhibiting sbgi, g34665, sbg2, g350i7 or g350i 8 expression. Following 
identification of candidate homopyrimidine or homopurine stretches, their efficiency in 
inhibiting sbgl, g34665, sbg2, g35017 or g3501 8 expression is assessed by introducing varying 
amounts of oligonucleotides containing the candidate sequences into tissue culture cells which 
express the sbgl, g34665, sbg2, g35017 or g35018 gene. 

The oligonucleotides can be introduced into the cells using a variety of methods known 
to those skilled in the art, including but not limited to calcium phosphate precipitation, DEAE- 
Dextran, electroporation, liposome-mediated transfection or native uptake. 

Treated cells are monitored for altered cell function or reduced sbgl, g34665, sbg2, 
g35017 or g35018 expression using techniques such as Northern blotting, RNase protection 
assays, or PCR based strategies to monitor the transcription levels of the sbgl, g34665, sbg2, 
g35017 or g35018 gene in cells which have been treated with the oligonucleotide. 

The oligonucleotides which are effective in inhibiting gene expression in tissue culture 
cells may then be introduced in vivo using the techniques described above in the antisense 
approach at a dosage calculated based on the in vitro results, as described in antisense approach. 

In some embodiments, the natural (beta) anomers of the oligonucleotide units can be 
replaced with alpha anomers to render the oligonucleotide more resistant to nucleases. Further, 
an intercalating agent such as ethidium bromide, or the like, can be attached to the 3' end of the 
alpha oligonucleotide to stabilize the triple helix. For information on the generation of 
oligonucleotides suitable for triple helix formation see Griffin et al.(1989). 

Pharmaceutical Compositions And Formulations 

Sbgl -modulating Compounds 

Using the methods disclosed herein, compounds that selectively modulate sbgl activity in 
vitro and in vivo may be identified. The compounds identified by the process of the invention 
include, for example, antibodies having binding specificity for the sbgl peptide. It is also 
expected that homologues of sbgl may be useful for modulating sbgl -mediated activity and the 
related physiological condition associated with schizophrenia or bipolar disorder. Generally, it is 
further expected that assay methods of the present invention based on the role of sbgl in central 
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nervous system disorder may be used to identify compounds capable of intervening in the assay 
cascade of the invention. 
Indications 

While sbgl has demonstrated an association with schizophrenia and bipolar disorder, 
indications involving sbgl may include various central nervous system disorders. Nervous system 
disorders are expected to have complex genetic bases and often share certain symptoms. In 
particular, as described herein, indications may include schizophrenia and other psychotic 
disorders, mood disorders, autism, substance dependence and alcoholism, mental retardation, and 
other psychiatric diseases including cognitive, anxiety, eating, impulse-control, and personality 
disorders, as defined with the Diagnosis and Statistical Manual of Mental Disorders fourth edition 
(DSM-IV) classification. 

Pharmaceutical Formula tions and Rou tes of Adminktrat.™ 

The compounds identified using the methods of the present invention can be administered 
to a mammal, including a human patient, alone or in pharmaceutical compositions where they are 
mixed with suitable carriers or excipient(s) at therapeutically effective doses to treat or ameliorate 
• schizophrenia or bipolar disorder related disorders. A therapeutically effective dose further refers 
to that amount of the compound sufficient to result in amelioration of symptoms as determined by 
the methods described herein. Preferably, a therapeutically effective dosage is suitable for 
continued periodic use or administration. Techniques for formulation and administration of the 
compounds of the instant application may be found in "Remington's Pharmaceutical Sciences," 
Mack Publishing Co., Easton, PA, latest edition. 
Routes of Administration 

Suitable routes of administration include oral, rectal, transmural, or intestinal 
administration, parenteral delivery, including intramuscular, subcutaneous, intramedullary 
injections, as well as intrathecal, direct intraventricular, intravenous, intraperitoneal, intranasal or 
intraocular injections. A particularly useful method of administering compounds for treating 
central nervous system disease involves surgical implantation of a device for delivering the 
compound over an extended period of time. Sustained release formulations of the invented 
medicaments particularly are contemplated. 

Composition/Formulation 

Pharmaceutical compositions and medicaments for use in accordance with the present 
invention may be formulated in a conventional manner using one or more physiologically 
acceptable carriers comprising excipients and auxiliaries. Proper formulation is dependent upon 
the route of administration chosen. 

For injection, the agents of the invention may be formulated in aqueous solutions, 
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preferably in physiologically compatible buffers such as Hanks's solution, Ringer's solution, or 
physiological saline buffer such as a phosphate or bicarbonate buffer. For transmucosal 
administration, penetrants appropriate to the barrier to be permeated are used in the formulation. 
Such penetrants are generally known in the art. 

Pharmaceutical preparations which can be used orally include push-fit capsules made of 
gelatin, as well as soft, sealed capsules made of gelatin and a plasticizer, such as glycerol or 
sorbitol. The push-fit capsuies can contain the active ingredients in admixture with fillers such as 
lactose, binders such as starches, and/or lubricants such as talc or magnesium stearate and, 
optionally, stabilizers. In soft capsules, the active compounds may be dissolved or suspended in 
suitable liquids, such as fatty oils, liquid paraffin, or liquid polyethylene glycols. In addition, 
stabilizers may be added. All formulations for oral administration should be in dosages suitable 
for such administration. 

For buccal administration,the compositions may take the form of tablets or lozenges 
formulated in conventional manner. 

For administration by inhalation, the compounds for use according to the present 
invention are conveniently delivered in the form of an aerosol spray presentation from pressurized 
packs or a nebulizer, with the use of a suitable gaseous propellant, e.g., carbon dioxide. In the 
case of a pressurized aerosol the dosage unit may be determined by providing a valve to deliver a 
metered amount. Capsules and cartridges of, e.g., gelatin, for use in an inhaler or insufflator, may 
be formulated containing a powder mix of the compound and a suitable powder base such as 
lactose or starch. 

The compounds may be formulated for parenteral administration by injection, e.g., by 
bolus injection or continuous infusion. Formulations for injection may be presented in unit 
dosage form, e.g., in ampoules or in multi-dose containers, with an added preservative. The 
compositions may take such forms as suspensions, solutions or emulsions in aqueous vehicles, 
and may contain formulatory agents such as suspending, stabilizing and/or dispersing agents. 

Pharmaceutical formulations for parenteral administration include aqueous solutions of 
the active compounds in water-soluble form. Aqueous suspensions may contain substances which 
increase the viscosity of the suspension, such as sodium carboxymethyl cellulose, sorbitol, or 
dextran. Optionally, the suspension may also contain suitable stabilizers or agents which increase 
the solubility of the compounds to allow for the preparation of highly concentrated solutions. 

Alternatively, the active ingredient may be in powder or lyophilized form for constitution 
with a suitable vehicle, such as sterile pyrogen-free water, before use. 

In addition to the formulations described previously, the compounds may also be 
formulated as a depot preparation. Such long acting formulations may be administered by 
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implantation (for example subeutaneously or intramuscularly) or by intramuscular injection 
Thus, for example, the compounds may be formulated with suitable polymeric or hydrophobic 
materials (for example as an emulsion in an acceptable oil) or ion exchange resins, or as sparingly 
soluble derivatives, for example, as a sparingly soluble salt. 

Additionally, the compounds may be delivered using a sustained-release system such as 
semipermeable matrices of solid hydrophobic polymers containing the therapeutic agent' Various 
sustained release materials have been established and are well known by those skilled in the art 
Sustained-release capsules may, depending on their chemical nature, release the compounds for a 
few weeks up to over 1 00 days. 

Depending on the chemical nature and the biological stability of the therapeutic reagent, 
additional strategies for protein stabilization may be employed. 

The pharmaceutical compositions also may comprise suitable solid or gel phase carriers 
or excpients. Examples of such carriers or excipients include but are not limited to calcium 
carbonate, calcium phosphate, various sugars, starches, cellulose derivatives, gelatin, and 
polymers such as polyethylene glycols. 

Effective Dnsap p 

Pharmaceutical compositions suitable for use in the present invention include 
compositions wherein the active ingredients are contained in an effective amount to achieve their 
intended purpose. More specifically, a therapeutically effective amount means an amount 
effective to prevent development of or to alleviate the existing symptoms of the subject being 
treated. Determination of the effective amounts is well within the capability of those skilled in the 
art, especially in light of the detailed disclosure provided herein. 

For any compound used in the method of the invention, the therapeutically effecti ve dose 
can be estimated initially from cel. culture assays, and a dose can be formulated in animal mode.s 
Such information can be used to more accurately determine useful doses in humans. 

A therapeutically effective dose refers to that amount of the compound that results in 
amelioration of symptoms in a patient. Toxicity and therapeutic efficacy of such compounds can 
be determined by standard pharmaceutical procedures in cell cultures or experimental animals 
e.g., for determining the LD50, (the dose lethal to 50% of the test population) and the ED50 (the 
dose therapeutically effective in 50% of the population). The dose ratio between toxic and 
therapeutic effects is the therapeutic index and it can be expressed as the ratio between LDSO and 
ED50. Compounds which exhibit high therapeutic indices are preferred. 

The data obtained from these cell culture assays and animal studies can be used in 
formulating a range of dosage for use in human. The dosage of such compounds lies preferably 
within a range of circulating concentrations that include the ED50, with little or no toxicity The 
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dosage may vary within this range depending upon the dosage form employed and the route of 
administration utilized. The exact formulation, route of administration and dosage can be chosen 
by the individual physician in view of the patient's condition. (See, e.g., Fingl et al., 1975, in "The 
Pharmacological Basis of Therapeutics", Ch. 1). 

5 

Computer-Related Embodiments 

As used herein the term "nucleic acid codes of the invention" encompass the nucleotide 
sequences comprising, consisting essentially of, or consisting of any one of the following: 

a) a contiguous span of at least 12, 15, 18, 20, 25, 30, 35, 40, 50, 60, 70, 80, 90, 100, 
10 150, 200, 500, 1 000 or 2000 nucleotides of SEQ ID No. 1 , and the complements thereof, 

wherein said contiguous span comprises at least one of the following nucleotide positions of 
SEQ ID No 1 : 3 1 to 29265 1 and 292844 to 3 19608. 

b) a contiguous span of at least 12, 15, 18, 20, 25, 30, 35, 40, 50, 60, 70, 80, 90, 100, 
150, 200, 500, 1000 or 2000 nucleotides of any of SEQ ID Nos. 54 to 229, and the complements 

1 5 thereof, to the extent that such a length is consistent with the particular sequence ID. 

c) a contiguous span of at least 8, 12, 15, 18, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 
75, 80, 90, 1 00 or 200 nucleotides, to the extent that such a length is consistent with the 
particular sequence ID, of SEQ ID Nos. 2 to 26, 36 to 40 or the complements thereof. 

d) a contiguous span of at least 12, 15, 18, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 
20 80, 90 or 1 00 nucleotides of SEQ ID No. I or the complements thereof wherein said contiguous 

span comprises at least one of the following nucleotide positions of SEQ ID No 1 : 

(i) 292653 to 296047, 292653 to 292841, 295555 to 296047 and 295580 

to 296047; 

(ii) 31 to 1107, 11 08 to 65853, 1108 to 1289, 14877 to 14920, 18778 to 
25 1 8862, 25593 to 25740, 29388 to 29502, 29967 to 30282, 64666 to 64812, 65505 to 65853 and 

65854 to 67854; 

(Hi) 94124 to 94964; 

(iv) 213818 to 215818, 215819 to 215941, 215819 to 215975, 216661 to 
216952, 216661 to 2 17061, 217027 to 217061, 229647 to 229742, 230408 to 230721, 231272 

30 to 23 1 4 1 2, 23 1 787 to 23 1 880, 23 1 870 to 23 1 879, 2341 74 to 23432 1 , 237406 to 237428, 

239719 to 239807, 239719 to 239853, 240528 to 240569, 240528 to 240596, 240528 to 
240617, 240528 to 240644, 240528 to 240824, 240528 to 240994, 240528 to 241685, 240800 
to 240993 and 241 686 to 243685; and 

(v) 201 188 to 216915, 201 188 to 201234, 214676 to 214793, 215702 to 
35 215746 and 216836 to 216915; 
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e) a contiguous span according to a), b), c) or d), wherein said span includes a biallelic 
marker selected from the group consisting of A 1 to A489. 

f) a contiguous span of at least 12, 15, 18,20,25,30,35,40,50,60,70,80,90 100 
150,200,500, 1000 or 2000 nucleotides of SEQIDNo. 1 or the complements thereof, wherein 
said contiguous span comprises at least 1, 2, 3, 5, or 10 nucleotide positions of any one the 
ranges of nucleotide positions designated posl to posl66 of SEQ ID No. 1 listed in Table 1 
above; 

g) a contiguous span of at least 12, 15, 18,20,25,30,35,40,50,60,70,80,90, 100 

1 50, 200, 500, 1000 or 2000 nucleotides of any of SEQ ID Nos. 2 to 26, 36 to 42, 44 to 48 and 
52 to 269, and the complements thereof, wherein said span includes a chromosome 1 3q3 l-q33- 
related biallelic marker, a Region D-related biallelic marker, an sbgl-, g34665-, sbg2-, g35017- 
or g350 1 8 -related biallelic marker; 

h) a contiguous span of at least 1 2, 15, 1 8, 20, 25, 30, 35, 40, 50, 60, 70, 80, 90, 100, 
150, 200, 500, 1000 or 2000 nucleotides of any of SEQ ID Nos 2 to 26, 36 to 40 and 54 to 229, 
and the complements thereof, wherein said span includes a chromosome 13q3 l-q33-related 
biallelic marker, a Region D-related biallelic marker, an sbgl-, g 34665-, sbg2-, g35017- or 
g35018 -related biallelic marker with the alternative allele present at said biallelic marker. 

i) a contiguous span of at least 12, 15, 18, 20, 25, 30, 35, 40, 50, 60, 70, 80, 90 100 
150, 200, 500, 1 000 or 2000 nucleotides of any of SEQ ID No 1, and the complements'thereof, 
wherein said span includes a polymorphism selected from the group consisting of Al to A69 ' 
A71 to A74, A76 to A94, A96 to A106, A108 to Al 12, Al 14 to A177, A179 to A197, A199 to 
A222, A224 to A242 and 36 1 to A489. 

The "nucleic acid codes of the invention" further encompass nucleotide sequences 
homologous to a contiguous span of at least 30, 35, 40, 50, 60, 70, 80, 90, 100, 150, 200, 500 
1 000 or 2000 nucleotides, to the extent that such a length is consistent with the particular 
sequence of SEQ ID Nos. 1 to 26, 36 to 40 and 54 to 229, and the complements thereof. The 
"nucleic acid codes of the invention" also encompass nucleotide sequences homologous to a 
contiguous span of at least 1 2, 1 5, 1 8, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80 90 or 
100 nucleotides of SEQ ID No. 1 or the complements thereof, wherein said contiguous span 
comprises at least one of the following nucleotide positions of SEQ ID No. 1 : 

(i) 292653 to 296047, 292653 to 292841 , 295555 to 296047 and 295580 to 296047; 

(ii) 31 to 1 107, 1 108 to 65853, 1 108 to 1289, 14877 to 14920, 18778 to 18862 25593 to 
25740, 29388 to 29502, 29967 to 30282, 64666 to 64812, 65505 to 65853 and 65854 to 67854- 

(iii) 94124 to 94964; 

(iv) 213818 to215818, 215819 to215941, 215819 to215975, 216661 to 216952, 
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216661 to 217061, 217027 to 21 7061, 229647 to 229742, 230408 to 230721, 231272 to 
23 1 4 1 2, 23 1 787 to 23 1 880, 23 1 870 to 23 1 879, 234 1 74 to 23432 1 , 237406 to 23 7428, 2397 1 9 
to 239807, 239719 to 239853, 240528 to 240569, 240528 to 240596, 240528 to 240617, 
240528 to 240644, 240528 to 240824, 240528 to 240994, 240528 to 241685, 240800 to 240993 
5 and 241686 to 243685; and 

(v) 20 1 1 88 to 2 1 69 1 5, 201 1 88 to 20 1 234, 2 14676 to 2 1 4793, 2 1 5702 to 2 1 5746 and 
216836 to 216915. 

Homologous sequences refer to a sequence having at least 99%, 98%, 97%, 96%, 
95%, 90%, 85%, 80%, or 75% homology to these contiguous spans. Homology may be 
10 determined using any method described herein, including BLAST2N with the default 

parameters or with any modified parameters. Homologous sequences also may include RNA 
sequences in which uridines replace the thymines in the nucleic acid codes of the invention. It 
will be appreciated that the nucleic acid codes of the invention can be represented in the 
traditional single character format (See the inside back cover of Stryer, Lubert. Biochemistry* 

1 5 3 rd edition. W. H Freeman & Co., New York.) or in any other format or code which records 

the identity of the nucleotides in a sequence. 

As used herein the term "polypeptide codes of SEQ ID Nos. 27 to 35 and 41 to 43" 
encompasses the polypeptide sequence of SEQ ID Nos 27 to 35 and 41 to 43, polypeptide 
sequences homologous to the polypeptides of SEQ ID Nos. 27 to 35 and 41 to 43, or fragments of 

20 any of the preceding sequences. Homologous polypeptide sequences refer to a polypeptide 

sequence having at least 99%, 98%, 97%, 96%, 95%, 90%, 85%, 80%, 75% homology to one of 
the polypeptide sequences of SEQ ID Nos. 27 to 35 and 4 1 to 43 . Homology may be determined 
using any of the computer programs and parameters described herein, including FASTA with the 
default parameters or with any modified parameters. The homologous sequences may be obtained 

25 using any of the procedures described herein or may result from the correction of a sequencing 

error as described above. The polypeptide fragments comprise at least 4, 6, 8, 1 0, 1 5, 20, 25, 30, 35, 
40, 50, 75, 100, or 150 consecutive amino acids of the polypeptides of SEQ ID Nos. 27 to 35 and 
41 to 43. Preferably, the fragments are novel fragments. It will be appreciated that the polypeptide 
codes of the SEQ ID Nos. 27 to 35 and 41 to 43 can be represented in the traditional single 

30 character format or three letter format (See the inside back cover of Starrier, Lubert. Biochemistry, 

3 rd edition. W. H Freeman & Co., New York.) or in any other format which relates the identity of 
the polypeptides in a sequence. 

It will be appreciated by those skilled in the art that the nucleic acid codes of SEQ ID 
Nos. 1 to 26, 36 to 40 and 54 to 229 and polypeptide codes of SEQ ID Nos. 27 to 35 and 41 to 

35 43 can be stored, recorded, and manipulated on any medium which can be read and accessed by a 
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computer. As used herein, the words "recorded" and "stored" refer to a process for storing 
information on a computer medium. A skilled artisan can readily adopt any of the presently known 
methods for recording information on a computer readable medium to generate embodiment 
comprising one or more of nucleic acid codes of SEQ ID Nos. 1 to 26, 36 to 40 and 54 to 229, or 
one or more of the polypeptide codes of SEQ ID Nos. 27 to 35 and 41 to 43. Another aspect of 
the present invention is a computer readable medium having recorded thereon at least 2, 5, 1 0, 1 5, 
20, 25, 30, or 50 nucleic acid codes of SEQ ID Nos 1 to 26, 36 to 40 and 54 to 229. Another 
aspect of the present invention is a computer readable medium having recorded thereon at least 2, 
5, 10, 15, 20, 25, 30, or 50 polypeptide codes of SEQ ID Nos 27 to 35 and 41 to 43. 

Computer readable media include magnetically readable media, optically readable media, 
electronically readable media and magnetic/optical media. For example, the computer readable 
media may be a hard disk, a floppy disk, a magnetic tape, CD-ROM, Digital Versatile Disk (DVD) 
Random Access Memory (RAM), or Read Only Memory (ROM) as well as other types of other 
media known to those skilled in the art. 

Embodiments of the present invention include systems, particularly computer systems 
which store and manipulate the sequence information described herein. One example of a 
computer system 100 is illustrated in block diagram form in Figure 19. As used herein, "a 
computer system" refers to the hardware components, software components, and data storage 
components used to analyze the nucleotide sequences of the nucleic acid codes of SEQ ID Nos 1 
to 26, 36 to 40 and 54 to 229, or the amino acid sequences of the polypeptide codes of SEQ ID 
Nos. 27 to 35 and 41 to 43. In one embodiment, the computer system 100 is a Sun Enterprise 
1000 server (Sun Microsystems, Palo Alto, CA). The computer system 100 preferably includes a 
processor for processing, accessing and manipulating the sequence data. The processor 1 OS can be 
any well-known type of central processing unit, such as the Pentium III from Intel Corporation or 
similar processor from Sun, Motorola, Compaq or International Business Machines. 

Preferably, the computer system 1 00 is a general purpose system that comprises the 
processor 105 and one or more internal data storage components 1 10 for storing data, and one or 
more data retrieving devices for retrieving the data stored on the data storage components. A 
skilled artisan can readily appreciate that any one of the currently available computer systems are 
suitable. 

In one particular embodiment, the computer system 100 includes a processor 105 
connected to a bus which is connected to a main memory 1 1 5 (preferably implemented as RAM) 
and one or more internal data storage devices 1 1 0, such as a hard drive and/or other computer 
readable media having data recorded thereon. In some embodiments, the computer system 1 00 
further includes one or more data retrieving device 1 1 8 for reading the data stored on the internal 
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data storage devices 1 1 0. 

The data retrieving device 1 1 8 may represent, for example, a floppy disk drive, a compact 
disk drive, a magnetic tape drive, etc. In some embodiments, the internal data storage device 1 10 is 
a removable computer readable medium such as a floppy disk, a compact disk, a magnetic tape, etc. 
containing control logic and/or data recorded thereon. The computer system 100 may 
advantageously include or be programmed by appropriate software for reading the control logic 
and/or the data from the data storage component once inserted in the data retrieving device. 

The computer system 100 includes a display 120 which is used to display output to a 
computer user. It should also be noted that the computer system 100 can be linked to other 
computer systems 125a-c in a network or wide area network to provide centralized access to the 
computer system 100. 

Software for accessing and processing the nucleotide sequences of the nucleic acid codes 
of SEQ ID Nos. 1 to 26, 36 to 40 and 54 to 229, or the amino acid sequences of the polypeptide 
codes of SEQ ID Nos. 27 to 35 and 41 to 43 (such as search tools, compare tools, and modeling 
tools etc.) may reside in main memory 115 during execution. 

In some embodiments, the computer system 1 00 may farther comprise a sequence 
comparer for comparing the above-described nucleic acid codes of SEQ ID Nos. 1 to 26, 36 to 40 
and 54 to 229 or polypeptide codes of SEQ ID Nos. 27 to 35 and 41 to 43 stored on a computer 
readable medium to reference nucleotide or polypeptide sequences stored on a computer readable 
medium. A "sequence comparer" refers to one or more programs which are implemented on the 
computer system 100 to compare a nucleotide or polypeptide sequence with other nucleotide or 
polypeptide sequences and/or compounds including but not limited to peptides, peptidomimetics, 
and chemicals stored within the data storage means. For example, the sequence comparer may 
compare the nucleotide sequences of the nucleic acid codes of SEQ ID Nos. 1 to 26, 36 to 40 and 
54 to 229, or the amino acid sequences of the polypeptide codes of SEQ ID Nos. 27 to 35 and 4 1 
to 43 stored on a computer readable medium to reference sequences stored on a computer readable 
medium to identify homologies, motifs implicated in biological function, or structural motifs. The 
various sequence comparer programs identified elsewhere in this patent specification are 
particularly contemplated for use in this aspect of the invention. 

Figure 20 is a flow diagram illustrating one embodiment of a process 200 for comparing a 
new nucleotide or protein sequence with a database of sequences in order to determine the 
homology levels between the new sequence and the sequences in the database. The database of 
sequences can be a private database stored within the computer system 100, or a public database 
such as GENBANK, PIR OR SW1SSPROT that is available through the Internet. 

The process 200 begins at a start state 201 and then moves to a state 202 wherein the new 
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sequence to be compared is stored to a memory in a computer system 100. As discussed above, the 
memory could be any type of memory, including RAM or an internal storage device. 

The process 200 then moves to a state 204 wherein a database of sequences is opened for 
analysis and comparison. The process 200 then moves to a state 206 wherein the first sequence 
stored in the database is read into a memory on the computer. A comparison is then performed at a 
state 210 to determine if the first sequence is the same as the second sequence. It is important to 
note that this step is not limited to performing an exact comparison between the new sequence and 
the first sequence in the database. Well-known methods are known to those of skill in the art for 
comparing two nucleotide or protein sequences, even if they are not identical. For example, gaps 
can be introduced into one sequence in order to raise the homology level between the two tested 
sequences. The parameters that control whether gaps or other features are introduced into a 
sequence during comparison are normally entered by the user of the computer system. 

Once a comparison of the two sequences has been performed at the state 210, a 
determination is made at a decision state 210 whether the two sequences are the same'. Of course, 
the term "same" is not limited to sequences that are absolutely identical. Sequences that are within 
the homology parameters entered by the user will be marked as "same" in the process 200. 

If a determination is made that the two sequences are the same, the process 200 moves to a 
state 2 14 wherein the name of the sequence from the database is displayed to the user. This state 
notifies the user that the sequence with the displayed name fulfills the homology constraints that 
were entered. Once the name of the stored sequence is displayed to the user, the process 200 
moves to a decision state 2 1 8 wherein a determination is made whether more sequences exist in the 
database. If no more sequences exist in the database, then the process 200 terminates at an end 
state 220. However, if more sequences do exist in the database, then the process 200 moves to a 
state 224 wherein a pointer is moved to the next sequence in the database so that it can be 
compared to the new sequence. In this manner, the new sequence is aligned and compared with 
every sequence in the database. 

It should be noted that if a determination had been made at the decision state 212 thatthe 
sequences were not homologous, then the process 200 would move immediately to the decision 
state 2 1 8 in order to determine if any other sequences were available in the database for 
comparison. 

Accordingly, one aspect of the present invention is a computer system comprising a 
processor, a data storage device having stored thereon a nucleic acid code of SEQ ID NOs. 1 to 
26, 36 to 40 and 54 to 229 or a polypeptide code of SEQ ID Nos 27 to 35 and 41 to 43, a data 
storage device having retrievably stored thereon reference nucleotide sequences or polypeptide 
sequences to be compared to the nucleic acid code of SEQ ID Nos. 1 to 26, 36 to 40 and 54 to 



WO 00/58510 PC17IB00/00435 

185 

229 or polypeptide code of SEQ ID Nos. 27 to 35 and 41 to 43 and a sequence comparer for 
conducting the comparison. The sequence comparer may indicate a homology level between the 
sequences compared or identify structural motifs in the above described nucleic acid code of 
SEQ ID Nos. 1 to 26, 36 to 40 and 54 to 229 and polypeptide codes of SEQ ID Nos. 27 to 35 
and 41 to 43or it may identify structural motifs in sequences which are compared to these 
nucleic acid codes and polypeptide codes. In some embodiments, the data storage device may 
have stored thereon the sequences of at least 2, 5, 10, 15, 20, 25, 30, or 50 of the nucleic acid 
codes of SEQ ID Nos. 1 to 26, 36 to 40 and 54 to 229 or polypeptide codes of SEQ ID Nos. 27 
to 35 and 41 to 43. 

Another aspect of the present invention is a method for determining the level of homology 
between a nucleic acid code of SEQ ID Nos. 1 to 26, 36 to 40 and 54 to 229 and a reference 
nucleotide sequence, comprising the steps of reading the nucleic acid code and the reference 
nucleotide sequence through the use of a computer program which determines homology levels and 
determining homology between the nucleic acid code and the reference nucleotide sequence with 
the computer program. The computer program may be any of a number of computer programs for 
determining homology levels, including those specifically enumerated herein, including BLAST2N 
with the default parameters or with any modified parameters. The method may be implemented 
using the computer systems described above. The method may also be performed by reading 2, 5, 
10, 15, 20, 25, 30, or 50 of the above described nucleic acid codes of SEQ ID Nos. 1 to 26, 36 to 
40 and 54 to 229 through use of the computer program and determining homology between the 
nucleic acid codes and reference nucleotide sequences . 

Figure 21 is a flow diagram illustrating one embodiment of a process 250 in a computer 
for determining whether two sequences are homologous. The process 250 begins at a start state 
252 and then moves to a state 254 wherein a first sequence to be compared is stored to a 
memory. The second sequence to be compared is then stored to a memory at a state 256. The 
process 250 then moves to a state 260 wherein the first character in the first sequence is read 
and then to a state 262 wherein the first character of the second sequence is read. It should be 
understood that if the sequence is a nucleotide sequence, then the character would normally be 
either A, T, C, G or U. If the sequence is a protein sequence, then it should be in the single 
letter amino acid code so that the first and sequence sequences can be easily compared. 

A determination is then made at a decision state 264 whether the two characters are the 
same. If they are the same, then the process 250 moves to a state 268 wherein the next 
characters in the first and second sequences are read. A determination is then made whether 
the next characters are the same. If they are, then the process 250 continues this loop until two 
characters are not the same. If a determination is made that the next two characters are not the 
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same, the process 250 moves to a decision state 274 to determine whether there are any more 
characters either sequence to read. 

If there aren't any more characters to read, then the process 250 moves to a state 276 
wherein the level of homology between the first and second sequences is displayed to the user. 
The level of homology is determined by calculating the proportion of characters between the 
sequences that were the same out of the total number of sequences in the first sequence Thus 
if every character in a first 100 nucleotide sequence aligned with a every character in a second 
sequence, the homology level would be 100%. 

Alternatively, the computer program may be a computer program which compares the 
nucleotide sequences of the nucleic acid codes of the present invention, to reference nucleotide 
sequences in order to determine whether the nucleic acid code of SEQ ID Nos. 1 to 26, 36 to 40 
and 54 to 229 differs from a reference nucleic acid sequence at one or more positions. Optionally 
such a program records the length and identity of inserted, deleted or substituted nucleotides with 
respect to the sequence of either the reference polynucleotide or the nucleic acid code of SEQ ID 
Nos. 1 to 26, 36 to 40 and 54 to 229. In one embodiment, the computer program may be a 
program which determines whether the nucleotide sequences of the nucleic acid codes of SEQ ID 
Nos. 1 to 26, 36 to 40 and 54 to 229 contain a biallelic marker or single nucleotide polymorphism 
(SNP) with respect to a reference nucleotide sequence. This single nucleotide polymorphism may 
comprise a single base substitution, insertion, or deletion, while this biallelic marker may 
comprise abour one to ten consecutive bases substituted, inserted or deleted. 

Another aspect of the present invention is a method for determining the level of homology 
between a polypeptide code of SEQ ID Nos. 27 to 35 and 4 1 to 43 and a reference polypeptide 
sequence, comprising the steps of reading the polypeptide code of SEQ ID Nos. 27 to 35 and 4 1 to 
43 and the reference polypeptide sequence through use of a computer program which determines 
homology levels and determining homology between the polypeptide code and the reference 
polypeptide sequence using the computer program. 

Accordingly, another aspect of the present invention is a method for determining whether a 
nucleic acid code of SEQ ID Nos. 1 to 26, 36 to 40 and 54 to 229 differs at one or more nucleotides 
from a reference nucleotide sequence comprising the steps of reading the nucleic acid code and the 
reference nucleotide sequence through use of a computer program which identifies differences 
between nucleic acid sequences and identifying differences between the nucleic acid code and the 
reference nucleotide sequence with the computer program. In some embodiments, the computer 
program is a program which identifies single nucleotide polymorphisms. The method may be 
implemented by the computer systems described above and the method illustrated in Figure 21 . 
The method may also be performed by reading at least 2, 5, 1 0, 1 5, 20, 25, 30, or 50 of the nucleic 
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acid codes of SEQ ID Nos. 1 to 26, 36 to 40 and 54 to 229 and the reference nucleotide sequences 
through the use of the computer program and identifying differences between the nucleic acid 
codes and the reference nucleotide sequences with the computer program. 

In other embodiments the computer based system may further comprise an identifier for 
identifying features within the nucleotide sequences of the nucleic acid codes of SEQ ID Nos. 1 to 
26, 36 to 40 and 54 to 229 or the amino acid sequences of the polypeptide codes of SEQ ID Nos. 
27 to 35 and 41 to 43. 

An "identifier" refers to one or more programs which identifies certain features within 
the above-described nucleotide sequences of the nucleic acid codes of SEQ ID Nos. 1 to 26, 36 
to 40 and 54 to 229 or the amino acid sequences of the polypeptide codes of SEQ ID Nos. 27 to 
35 and 41 to 43. In one embodiment, the identifier may comprise a program which identifies an 
open reading frame in the cDNAs codes of SEQ ID Nos 2 to 26 and 36 to 40. 

Figure 22 is a flow diagram illustrating one embodiment of an identifier process 300 for 
detecting the presence of a feature in a sequence. The process 300 begins at a start state 302 
and then moves to a state 304 wherein a first sequence that is to be checked for features is 
stored to a memory 1 1 5 in the computer system 100. The process 300 then moves to a state 306 
wherein a database of sequence features is opened. Such a database would include a list of each 
feature's attributes along with the name of the feature. For example, a feature name could be 
"Initiation Codon" and the attribute would be "ATG". Another example would be the feature 
name "TAATAA Box" and the feature attribute would be "TAATAA". An example of such a 
database is produced by the University of Wisconsin Genetics Computer Group 
(www.gcg.com). 

Once the database of features is opened at the state 306, the process 300 moves to a 
state 308 wherein the first feature is read from the database. A comparison of the attribute of 
the first feature with the first sequence is then made at a state 3 10. A determination is then 
made at a decision state 316 whether the attribute of the feature was found in the first sequence. 
If the attribute was found, then the process 300 moves to a state 3 1 8 wherein the name of the 
found feature is displayed to the user. 

The process 300 then moves to a decision state 320 wherein a determination is made 
whether move features exist in the database. If no more features do exist, then the process 300 
terminates at an end state 324. However, if more features do exist in the database, then the 
process 300 reads the next sequence feature at a state 326 and loops back to the state 3 1 0 
wherein the attribute of the next feature is compared against the first sequence. 

It should be noted, that if the feature attribute is not found in the first sequence at the 
decision state 316, the process 300 moves directly to the decision state 320 in order to 
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determine if any more features exist in the database. 

In another embodiment, the identifier may comprise a molecular modeling program 
which determines the 3-dimensional structure of the polypeptides codes of SEQ ID Nos. 27 to 
35 and 41 to 43. In some embodiments, the molecular modeling program identifies target 
sequences that are most compatible with profiles representing the structural environments of the 
residues in known three-dimensional protein structures. (See, e.g., Eisenberg et al., U S Patent 
No. 5,436,850 issued July 25, 1995). In another technique, the known three-dimensional 
structures of proteins in a given family are superimposed to define the structurally conserved 
reg.ons in that family. This protein modeling technique also uses the known three-dimensional 
structure of a homologous protein to approximate the structure of the polypeptide codes of SEQ 
ID Nos. 4 to 8. (See e.g., Srinivasan, et al., U.S. Patent No. 5,557,535 issued September 17, 
1996). Conventional homology modeling techniques have been used routinely to build models 
of proteases and antibodies. (Sowdhamini et al., Protein Engineering 10:207, 215 (1997)) 
Comparative approaches can also be used to develop three-dimensional protein models when 
the protein of interest has poor sequence identity to template proteins. In some cases, proteins 
fold ,nto similar three-dimensional structures despite having very weak sequence identities. For 
example, the three-dimensional structures of a number of helical cytokines fold in similar three- 
dimensional topology in spite of weak sequence homology. 

The recent development of threading methods now enables the identification of likely 
folding patterns in a number of situations where the structural relatedness between target and 
template(s) is not detectable at the sequence level. Hybrid methods, in which fold recognition is 
performed using Multiple Sequence Threading (MST), structural equivalencies are deduced 
from the threading output using a distance geometry program DRAGON to construct a low 
resolution model, and a full-atom representation is constructed using a molecular modeling 
package such as QUANTA. 

According to this 3-step approach, candidate templates are first identified by using the 
novel fold recognition algorithm MST, which is capable of performing simultaneous threading 
of mult,ple aligned sequences onto one or more 3-D structures. In a second step, the structural 
eqmvalencies obtained from the MST output are converted into interresidue distance restraints 
and fed into the distance geometry program DRAGON, together with auxiliary information 
obtained from secondary structure predictions. The program combines the restraints in an 
unbiased manner and rapidly generates a large number of low resolution model confirmations. 
In a third step, these low resolution model confirmations are converted into full-atom models 
and subjected to energy minimization using the molecular modeling package QUANTA. (See 
e.g., Asz6di etal., Proteins:Structure, Function, and Genetics, Supplement 1:38-42 (1997)). 
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The results of the molecular modeling analysis may then be used in rational drug design 
techniques to identify agents which modulate the activity of the polypeptide codes of SEQ ID 
Nos. 27 to 35 and 41 to 43. 

Accordingly, another aspect of the present invention is a method of identifying a feature 
within the nucleic acid codes of SEQ ID Nos. 1 to 26, 36 to 40 and 54 to 229 or the polypeptide 
codes of SEQ ID Nos. 27 to 35 and 41 to 43 comprising reading the nucleic acid code(s) or the 
polypeptide code(s) through the use of a computer program which identifies features therein and 
identifying features within the nucleic acid code(s) or polypeptide code(s) with the computer 
program. In one embodiment, computer program comprises a computer program which 
identifies open reading frames. In a further embodiment, the computer program identifies 
structural motifs in a polypeptide sequence. In another embodiment, the computer program 
comprises a molecular modeling program. The method may be performed by reading a single 
sequence or at least 2, 5, 10, 15, 20, 25, 30, or 50 of the nucleic acid codes of SEQ ID Nos. 1 to 
26, 36 to 40 and 54 to 229 or the polypeptide codes of SEQ ID Nos. 27 to 35 and 41 to 43 
through the use of the computer program and identifying features within the nucleic acid codes 
or polypeptide codes with the computer program. 

The nucleic acid codes of SEQ ID Nos. 1 to 26, 36 to 40 and 54 to 229 or the 
polypeptide codes of SEQ ID Nos. 27 to 35 and 41 to 43 may be stored and manipulated in a 
variety of data processor programs in a variety of formats. For example, the nucleic acid codes of 
SEQ ID Nos. 1 to 26, 36 to 40 and 54 to 229 or the polypeptide codes of SEQ ID Nos. 27 to 35 
and 41 to 43 may be stored as text in a word processing file, such as Microsoft WORD or 
WORDPERFECT or as an ASCII file in a variety of database programs familiar to those of skill in 
the art, such as DB2, SYBASE, or ORACLE. In addition, many computer programs and databases 
may be used as sequence comparers, identifiers, or sources of reference nucleotide or polypeptide 
sequences to be compared to the nucleic acid codes of SEQ ID Nos. 1 to 26, 36 to 40 and 54 to 
229 or the polypeptide codes of SEQ ID Nos. 27 to 35 and 41 to 43. The following list is 
intended not to limit the invention but to provide guidance to programs and databases which are 
useful with the nucleic acid codes of SEQ ID Nos. 1 to 26, 36 to 40 and 54 to 229 or the 
polypeptide codes of SEQ ID Nos. 27 to 35 and 41 to 43. The programs and databases which 
may be used include, but are not limited to: MacPattern (EMBL), DiscoveryBase (Molecular 
Applications Group), GeneMine (Molecular Applications Group), Look (Molecular Applications 
Group), MacLook (Molecular Applications Group), BLAST and BLAST2 (NCBI), BLASTN and 
BLASTX (Altschul et al, J. Mol Biol 215: 403 (1990)), FASTA (Pearson and Lipman, Proc. Natl 
Acad. Set USA, 85: 2444 (1988)), FASTDB (Brutlag et al. Comp. App. Biosci. 6:237-245, 1990), 
Catalyst (Molecular Simulations Inc.), Catalyst/SHAPE (Molecular Simulations Inc.), 
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Cerius 2 .DBAccess (Molecular Simulations Inc.), HypoGen (Mo.ecu.ar Simu.ations Inc ) Insight II 
(Molecular Simulations Inc.), Discover (Molecular Simulations Inc.), CHARMm (Molecular ' 
S.mulations Inc.), Felix (Molecular Simulations Inc.), DelPhi, (Molecular Simulations Inc ) 
QuanteMM, (Molecular Simulations Inc.), Homology (Molecular Simulations Inc.) Modeler 
(Molecular Simulations Inc.), ISIS (Molecular Simulations Inc.), Quanta/Protein Design 
(Molecular Simu.ations Inc.), WebLab (Molecular Simulations Inc.), WebLab Diversity Explorer 
(Molecular Simulations Inc.), Gene Explorer (Molecular Simulations Inc.), SeqFo.d (Molecular 
S.mu.ations Inc.), the EMBL/Swissprotein database, the MDL Available Chemicals Directory 
database, the MDL Drug Data Report data base, the Comprehensive Medicinal Chemistry 
database, Derwents's World Drug Index database, the BioByteMasterFi.e database, the Genbank 
database, and the Genseqn database. Many other programs and data bases would be apparent to 
one of skill m the art given the present disclosure. 

Motifs which may be detected using the above programs include sequences encoding 
leucme zippers, he.ix-turn-helix motifs, g.ycosylation sites, ubiquitination sites, alpha helices 
and beta sheets, signal sequences encoding signal peptides which direct the secretion of the ' 
encoded proteins, sequences implicated in transcription regulation such as homeoboxes acidic 
stretches, enzymatic active sites, substrate binding sites, and enzymatic cleavage sites. ' 

Throughout this application, various publications, patents, and published patent 
apphcations are cited. The disclosures of the publications, patents, and published patent 
specficat.ons referenced in this application are all hereby incorporated by reference in their 
ent,ret,es into the present disclosure to more fully describe the state of the art to which this 
invention pertains. 

EXAMPIF.S 

Several of the methods of the present invention are described in the following 
examples, which are offered by way of illustration and not by way of limitation. Many other 
mod.fications and variations of the invention as herein set forth can be made without departing 
from the spirit and scope thereof and therefore only such limitations should be imposed as are 
indicated by the appended claims. 

Example 1 

Identification Of Biallelic Markers - BNA Extraction 
Donors were unrelated and healthy. They presented a sufficient diversity for being 
representee of a heterogeneous population. The DNA from 100 individuals was extracted 
and tested for the detection of the biallelic markers. 
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30 ml of peripheral venous blood were taken from each donor in the presence of EDTA. 
Cells (pellet) were collected after centrifugation for 10 minutes at 2000 rpm. Red cells were 
lysed by a lysis solution (50 ml final volume: 10 mM Tris pH7.6; 5 mM MgCl2; 10 mM NaCI). 
The solution was centrifuged (10 minutes, 2000 rpm) as many times as necessary to eliminate 
5 the residual red cells present in the supernatant, after resuspension of the pellet in the lysis 

solution. 

The pellet of white cells was lysed overnight at 42°C with 3.7 ml of lysis solution 
composed of: 

- 3 ml TE 10-2 (Tris-HCl 10 mM, EDTA 2 mM) / NaCI 0 4 M 
10 -200 >iISDS 10% 

- 500 nl K-proteinase (2 mg K-proteinase in TE 1 0-2 / NaCI 0.4 M). 

For the extraction of proteins, 1 ml saturated NaCI (6M) (1/3.5 v/v) was added. After 
vigorous agitation, the solution was centrifuged for 20 minutes at 10000 rpm. 

1 5 For the precipitation of DNA, 2 to 3 volumes of 100% ethanol were added to the 

previous supernatant, and the solution was centrifuged for 30 minutes at 2000 rpm. The DNA 
solution was rinsed three times with 70% ethanol to eliminate salts, and centrifuged for 20 
minutes at 2000 rpm. The pellet was dried at 37°C, and resuspended in 1 ml TE 10-1 or 1 ml 
water. The DNA concentration was evaluated by measuring the OD at 260 nm (1 unit OD = 50 

20 ng/ml DNA). To determine the presence of proteins in the DNA solution, the OD 260 / OD 280 

ratio was determined. Only DNA preparations having a OD 260 / OD 280 ratio between 1 .8 
and 2 were used in the subsequent examples described below. 

The pool was constituted by mixing equivalent quantities of DNA from each individual. 

25 Example 2 

Identification Of Biallelic Markers: Amplification Of Genomic DNA By PCR 

The amplification of specific genomic sequences of the DNA samples of Example 1 
was carried out on the pool of DNA obtained previously. In addition, 50 individual samples 
were similarly amplified. 
30 PCR assays were performed using the following protocol: 



Final volume 25 \x\ 

DNA 2 ng/nl 

MgCl 2 2 mM 

dNTP (each) 200 \xM 

35 primer (each) 2.9 ng/|il 
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Ampli Taq Gold DNA polymerase 

PCR buffer (1 Ox = 0.1 M TrisHCl pH8.3 0.5M KCI) 



PCT/IBOO/00435 

0.05 unit/pl 
lx 



Each pair of first primers was designed using the sequence information of genomic 
DNA sequences of SEQ ID Nos 1 to 26, 36 to 40 and 54 to 229 disclosed herein and the OSP 
software (Hillier & Green, 1991). This first pair of primers was about 20 nucleotides in length 
and had the sequences disclosed in Table 6a in the columns labeled "Position range of 
amplification primer in SEQ ID No." and "Complementary position range of amplification 
primer in SEQ ID No.". 



Amplicon 


SEQ 
ID No 


Primer 
name 


Position range of 
amplification primer 
in SEQ ID 


P rim 

name 


Com piemen tary 
position range of 
amplification primer 
in SEO TT» 


99-27943 


1 


Bl 


7938 


7958 


CI 


8446 


8465 


8-121 

f\f\ -"^ 


1 


B2 


14699 


14718 


C2 


15100 


15118 


99-27935 

O t 


1 


B3 


21365 


21385 


C3 


21845 


21864 


8-122 


1 


B4 


25409 


25426 • 


C4 




25844 


St 1 


1 


B5 


29349 


29366 


C5 


29684 


29701 


o- 1 *f / 


J— 


B6 


29900 


29919 


C6 


30340 


30356 


99-34243 




B7 


49219 


49239 


C7 


49664 


49684 


8-127 




B8 


64639 


64657 


C8 


64981 


64999 


8-128 




B9 


65453 


65471 


C9 


65856 


65874 


8-129 




B10 


65547 


65566 


C10 


65949 


65966 


99-34240 




Bll 


75629 


75649 


Cll 


76140 


76158 


99-31959 




B12 


94254 


94273 


C12 


94683 


94703 


99-31960 




B13 


95034 


95053 


C13 


95543 


95563 


99-31962 




B14 


96707 


96727 


C14 


97222 


97242 


99-44282 




B15 


106357 


106377 


C15 


106805 


106822 


99-24656 




B16 


107022 


107040 


C16 


107495 


107513 


99-24636 




B17 


107132 


107152 


C17 


107613 


107630 


99-31939 




B18 


108425 


108444 


C18 


108916 


108935 


99-44281 




B19 


109333 


109353 


C19 


109848 


109868 


99-31941 




B20 


112149 


112169 


C20 


112720 


112740 


99-31942 ; 




B21 


1 15144 


1 15162 


C21 


115617 


115637 
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99-24635 




B22 


155353 


155373 


C22 


155805 


155822 


c\e\ i s~ f\ r r\ 

99-16059 


1 


B23 


157860 


157878 


C23 I 


158296 


158316 


99-24634 


' 


B24 


160770 


160787 


C24 


161240 


161257 


99-24639 


1 


B25 


160279 


160298 


C25 


160785 


160802 


99-7652 


1 


B26 


168813 


168830 


C26 


169331 


169351 


99-16100 


1 


B27 


170666 


170686 


C27 


171153 


171173 


99-5862 


1 


B28 


173065 


173085 


C28 


173495 


173514 


99-16083 


1 


B29 


173830 


173850 


C29 


174309 


174327 


99-16044 


1 


B30 


175453 


175470 


C30 


175881 


175901 


99-16042 


1 


B31 


180464 


180481 


C31 


180991 


181008 


99-5919 


1 


B32 


189753 


189771 


C32 


190187 


190207 


99-24658 


1 


B33 


197116 


197135 


C33 


197555 


197572 


99-30364 


1 


B34 


198666 


198684 


C34 


199148 


199168 


99-30366 


1 


B35 


200145 


200162 


C35 


200663 


200683 


99-16094 


1 


B36 


204263 


204282 


C36 


204643 


204662 


99-24644 


1 


B37 | 


204741 


204758 


C37 


205222 


205240 


99-16107 


1 


B38 


206103 


206120 


C38 


206548 


206568 


99-15873 


1 


B39 


211454 


211471 


C39 


211893 


211910 


8-124 


1 


B40 


214564 


214581 


C40 


214965 


214983 


8-125 


1 


B41 


215506 


215525 


C41 


215924 


215942 


8-132 


1 


B42 


215628 


215647 


C42 


215998 


216016 


99-13929 


1 


B43 


215749 


215769 


C43 


216210 


216228 


8-131 


1 


B44 


216473 


216491 


C44 


216883 


216900 


8-130 


1 


B45 


216683 


216702 


C45 


217091 


217109 


8-209 


1 


B46 


217119 


217136 


C46 


217521 


217539 


99-5897 


1 


B47 


219408 


219425 


C47 


219882 


219899 


99-24649 


1 


B48 


220505 


220522 


C48 


221004 


221021 


8-199 


1 


B49 


221384 


22 1 402 


C49 


221807 


221824 


8-198 


1 


B50 


221740 


221759 


C50 


222167 


222185 


6-195 




nr 1 

B51 


222696 


2227 1 3 


C51 


223073 


223093 


99-13925 




B52 


223499 


223518 


C52 


224013 


224033 


8-192 




B53 


225103 


225120 


C53 


225505 


225524 


99-16090 




B54 


225995 


226013 


C54 


226510 


226530 


8-189 




B55 


22621 1 


226230 


C55 


226615 


226632 
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8-188 


1 


B56 


226569 


226588 


C56 


226988 


227005 


8-187 


1 


B57 


226915 


226934 


C57 


227319 


997HS 


8-185 


1 


B58 


227468 


227487 


C58 


227888 


227907 


99-16051 


1 


B59 


227768 


227788 


C59 


1228214 


99R97 1 


8-184 


1 


B60 


227832 


227849 


C60 


1228234 


22R2S9 i 


8-183 


1 


B61 


228209 


228227 


C61 


1 228635 


22ooD4 ] 


[8-181 


1 


B62 


228898 


228917 


C62 


1 2294QQ 


2295 J 7 1 


18-180 
I 8-179 


i 


B63 


" 229443 


229462 


C63 


1 22Q694 

1 iX^OZt 


229642 1 


1 8-143 


i 


B64 


229442 


229459 


C64 




229874 j 




l ' 


B65 


229487 


229506 




1 99QCO/C 


229913 


1 8-178 


— i — 


B66 


229739 


229756 






230159 1 


18-177 
18-119 


' i ■ 


B67 


230097 


230115 


C(\l 

V^VJ f 


23 U!> 1 / 


230536 | 






B68 


230210 


230227 


L/OO 


230622 


230641 1 


18-138 




B69 


230517 


230536 


voy 


230899 


230917 1 


8-175 


i 


B70 


230705 


230724 




97 11 IT 
231 ill 


231144 


99-15870 


i 


B71 j 


231278 


231298 


P71 


23 1 729 


231747 j 


8-142 


i 


B72 


231084 


231 103 


P79 t 


97 1 AOC 

23 14o5 


231503 1 


8-145 


" i 


B73 


231588 


231605 


C73 




232007 1 


8-171 




B74 


232147 


232166 


C74 \ 




232566 j 


8-170 


l "'" 


B75 


232405 


232423 


C75 




232849 1 


8-169 


i — 


B76 


232744 


232762 


C76 f 




233 1 63 j 


8-168 


i 


B77 


233056 


233074 


C77 I 




233479 I 


8-235 


i 


B78 


233314 


233334 


C78 1 


233785 


9ncni 1 
2->3oU 1 


8-137 


i 


B79 


234039 


234058 


C79 i 


234440 


9144^« 1 


8-165 


i 


B80 


234516 


234533 


C80 


234916 


234Q1<\ f 


99-16087 


i 


B81 


235081 


235101 


C81 


235515 


23SSH 1 


8-157 


i 


B82 


237972 


237989 


C82 


238381 


238399 


8-155 


i 


B83 


238607 


238626 


C83 


239029 


239046 | 


99-16038 


1 


B84 


239405 


239425 


C84 


239862 


239880 






B85 


239606 


239624 


C85 f 


240012 


240029 


8-153 




B86 


239651 


239670 


C86 | 


240058 


240075 


8-135 




B87 


240356 : 


240375 


C87 j 


240691 


240708 j 


99-16050 




B88 : 


240518 ; 


240538 


C88 


240988 : 


241006 


8-144 




B89 : 


240810 ; 


240828 


C89 


241217 


241235 
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8-141 




R90 


24 1 094 


24 1 11 3 


C90 


241502 


241520 


99-15880 


— i 


R91 


241700 


24171 7 


C91 


242151 


242171 


8-140 





RQ? 


241171 


241192 

A» « 1 .7 7i. 


C92 


241773 


241792 


8-940 


- 


RQ1 

D7J 


74? 169 


242188 \ 


C93 


242571 


242588 


R ??S 




D7H 


Z^tH J /z 


?441 01 

XHH 171 


ro4 


?44^74 


?44 C ,Q1 






DQC 
U7J 


Z4 / D I -> 


?47^H 
Z*t / jjj 


L/7J 


74RH91 


74R0.41 








z4ozU4 


0/m77"l 
z4oZZj 




z4ojoo 


Z4oOUO 


99-16055 


1 


B97 


253315 


253333 


/ 


253o lo 


TCI 0*3 /I 

253834 


99-16105 


1 


B98 


255697 


Off 71 r 

z55 / 1 5 




z5ol33 


z5ol 52 


99-16101 


1 


B99 


258138 


258155 


C99 


258606 


258623 


99-16033 


1 


B100 


259885 


259902 


C100 


260324 


260342 


99-15875 


1 


B101 


279626 


279644 


C101 


280154 


280173 


99-13521 


1 


B102 


287977 


287995 


C102 


288484 


288504 


8-1 12 


1 


B103 


292501 


292519 


C103 


292901 


292920 


8-111 


1 


B104 


295376 


295395 


C104 


295777 


295795 


8-1 10 


1 


B105 


295682 


295701 


C105 


296102 


296119 | 


8-134 


1 


B106 


295812 


295830 


C106 


296143 


296161 


99-7462 




B107 


298946 


298964 


C107 


299459 


299476 


99-16052 




B108 


300153 ! 


300170 


C108 


300660 


300680 


99-16047 


j 


B109 


311615 


311632 


C109 


312126 


312144 


99-25993 




B110 


315649 


315668 


CllO 


316129 


316147 


99-25101 


1 


BUI 


316925 


316943 


cm 


317378 


317395 


Amplicon 


SEQ 

TT% "Mr* 

WJ ino 


Primer 
name 


Position range of 
amplification primer 
in SEQ ID 


Primer 
name 


Complementary 
position range of 
amplification primer 
in SEQ ID 


R-Q4 


16? 


R1 12 

Dl IX 


1250 


1267 


cn? 


1651 


1669 


0*7J 


161 


Rl 13 


1125 


1144 


CI 11 


1526 


1543 


8-97 


160 


Bl 14 


1249 


1268 


Cl 14 


1581 


1598 


8-08 

0"70 


1J7 


Rl 15 


1135 


1154 


CI 15 

V-' LIU 


1550 


1568 


99-14021 


151 


Bl 16 


1394 


1411 


Cl 16 


1853 


1870 


99-14364 


152 


Bin 


1344 


1364 


C117 


1798 


1816 


99-15056 


115 


Bl 18 


1098 


1118 


Cl 18 


1582 


1599 


99-15063 


116 


B119 


1347 


1364 


C119 


1,784 


1804 


99-15065 


117 


B120 


1120 


1140 


C120 


1568 


1585 
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199-15229 


157 


B121 


1419 


1437 


C121 


1893 


1912 


99-15231 


163 


B122 


1189 


1209 


C122 


1701 


1719 


99-15232 


155 


B123 


1211 


1228 


C123 


1677 


1695 


199-15239 


164 


B124 


1139 


1156 


C124 


1579 


1599 


199-15252 


118 


B125 


1 


18 


C125 


434 


451 


199-15253 


119 


B126 


1120 


1138 


C126 


1578 


1596 


199-15256 


120 


B127 


1110 


1127 


C127 


1548 


1565 


(99-15258 


121 


B128 


1165 


1183 


C128 


1685 


1705 


199-15261 


122 


B129 


1302 


1320 


C129 


1782 


1802 


199-15280 


123 


B130 


1070 


1087 


C130 


1590 


1610 


99-15355 


124 


B131 


1352 


1369 


CI31 


1822 




99-15663 
1 99-1 5664 


175 


B132 


1349 


1369 


C132 


1781 


17QR 




176 


B133 


1184 


1203 


C133 


1667 


I OOJ 


99-15665 


174 


B134 


1423 


1441 


C134 


1879 


I 070 


199-15668 


177 


B135 


1363 


1380 


C135 


1801 


1 oZ 1 


99-15672 


173 


B136 , 


1120 


1138 


C136 


164Q 


1 £££ 1 
lOOO 


99-15682 


178 


B137 


1184 


1202 


C137 


1665 


1 Ml 


99-16081 


113 


B138 


114 


131 


C138 


556 


j / j 


99-16082 


114 


B139 


6 


33 


C139 


527 


547 


99-20933 


179 


B140 


1130 


1 149 


C140 


1563 


1581 


99-20977 


147 


B141 


1430 


1447 


C141 


1921 


1941 


99-20978 


148 


B142 


1124 


1144 


C142 


1571 


1589 


99-20981 


149 


B143 


1202 


1219 


C143 


1630 


1650 


99-20983 


150 


B144 


1099 


1119 


C144 


1530 


1548 


99-22310 


154 


B145 


1183 


1203 


C145 


1630 


1648 


99-25029 


180 


B146 


1292 


1307 


C146 


1722 


1741 


99-25224 


125 


B147 


937 


955 j 


C147 


1446 


1466 


99-25869 

1 f\C\ c O o i 


181 


B148 


1320 


1340 


C148 


1849 


1868 


99-25881 


182 


B149 


1227 


1245 


C149 


1693 


1713 


99-25897 


183 


B150 


1242 


1262 


C150 


1736 


1756 


99-25906 


184 


B151 


1374 


1392 ; 


C151 


1888 


1908 


99-25917 


185 


B152 


1115 


1135 


C152 


1595 


1615 ! 


99-25924 


186 } 


B153 


1287 


1306 


C153 1 


1717 


1736 
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99-25950 


126 


Bl 54 


1381 


1399 


CI 54 


1 RSQ 


1 R7Q 

10/7 4 


99-9 S961 


197 


X-> 1 J J 


1191 


141 1 


PISS • 


1 


1 R7^ 




1 98 


D 1 JU 


149Q 
i tzy 


IddQ 


P 1 S6 


1 R70 

1 0 /y 


1 fiQQ 

1 ©yy 


QQ 7S0££ 

yy-z.>yoo 


1 7Q 

i zy 


D1C7 


1 7 1 Q 

1 z i y 


izjy 


r 1 i ^7 


1 70 1 
1 /ZI 


1 7/1 1 


yy-ZDyo / 


1 


t> 1 Do 


1 UOH 






1 ^17 r 


1 CCdl 
1 jjO 


yy-z jyoy 


I J 1 


d i Dy 


1171 
1 1 / 1 


1 1 Q1 

i iyi 


^ 1 <Q 

c l jy 


1 #^CQ 

1 00U 


1 700 


QQ 

yy-zoy /z 


1 Jz 


T> 1 £Lt\ 
D 160 


1 o/ro 

1 ioo 




CI 60 


1795 


101c 
1815 


yy-z:>y /4 


1 33 


D 1 61 


1 1 Aft 

1 1UU 


1 120 


C 161 


1623 


1643 


QQ OCQT7 

99-25977 


1 34 


13162 


i i n i 
1 191 


121 1 


C162 


710 


1730 


99-25978 


135 


B163 


1 1 c c 

1 155 


1 175 


CI 63 


1644 


1663 


99-25979 


136 


B164 


1409 


1427 


C164 


1924 


1944 


99-25980 


137 


B165 


1332 


1352 


C165 


1817 


1837 


99-25984 


138 


B166 


1293 


1310 


C166 


1794 


1812 


99-25985 


139 


B167 


1308 


1328 


C167 


1756 


1776 


99-25989 


140 


B168 


1346 


1366 


C168 


1880 


1898 


99-26126 


165 


B169 


1004 


1022 


C169 


1525 


1545 


99-26138 


187 


B170 


1309 


1327 


C170 


1741 


1761 


r\c\ ^ x* i A K 

99-26146 


188 


B171 


1314 


1334 


C171 


1746 


1764 


99-26147 


1 A 1 

141 


B172 


1433 


1453 


C172 


1879 


1896 


99-26150 


1 A 

142 


B173 


1323 


1340 


CI 73 


1758 


1776 


99-26153 


143 


B174 


1458 


1476 


C174 


1885 


1905 


99-26154 


AAA 

144 


B175 


1396 


i A i e 

1415 


CI 75 


1903 


1920 


fin 1£1C£ 

99-26156 


i /if 
145 


B176 


1212 


1229 


CI 76 


1702 


1722 


QQ 7iC 1 

yy-zoioo 


loo 


13177 


1237 


i o cn 
1257 


CI 77 


1739 


1757 


QQ 7£1 £J1 

yy-zo io / 


1 O / 


Ti 1 7Q 


1 1 1 o 

i j iy 


1 no 

i jjy 


L^l /o 


1 /->y 


1778 


QQ-961 


1 Do 


o l /y 


1 zoz 


1 987 
l zoz 


1 /y 


1 oyj 


1/11 


99-26171 




U 1 ov 


1431 




P 1 RO 


1 oou 




99-26183 


170 


B181 


1348 


1367 


C181 


1 798 


1 81 8 
1010 


99-26189 


189 


B182 


1215 


1235 


CI 82 


1644 


1664 


99-26190 


190 


B183 


1071 


1091 


C183 


1502 


1520 


99-26191 


191 


B184 


1095 


1115 


C184 


1539 


1558 


99-26201 


192 


B185 


1304 


1324 


C185 


1749 


1767 


99-26222 


193 


B186 


1354 


1373 


C186 


1843 


1863 
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99-27345 


221 


B220 


1 139 


1 159 


C220 


1672 


1689 


99-27349 


222 


B221 


1337 


1355 


C221 


1748 


1767 


99-27352 


223 


B222 


1250 


1269 


C222 


1677 


1697 


99-27353 


224 


B223 


1085 


1 105 


C223 


1584 


1604 


99-27360 


225 


B224 


1361 


1381 


C224 


1793 


1812 


99-27361 


226 


B225 


1322 


1340 


C225 


1815 


1834 


99-27365 


227 


B226 


1081 


1099 


C226 


1590 


1609 


99-27680 


228 


B227 


1 


18 


C227 


509 


526 


99-27912 


229 


B228 


1230 


1250 


C228 


1659 


1679 


99-30329 


112 


B229 


1 


18 


C229 


496 


514 



Preferably, the primers contained a common oligonucleotide tail upstream of the 
specific bases targeted for amplification which was useful for sequencing. 

Primers from the column labeled "Position range of amplification primer in SEQ ID 
No." contain the following additional PU 5' sequence: TGTAAAACGACGGCCAGT (SEQ ID 
No. 126); primers from the column labeled "Complementary position range of amplification 
primer in SEQ ID No." contain the following RP 5* sequence: CAGGAAACAGCTATGACC 
(SEQ ID No. 127). 

The synthesis of these primers was performed following the phosphoramidite method, on a 
GENSET UFPS 24.1 synthesizer. 

DNA amplification was performed on a Genius II thermocycler. After heating at 95°C 
for 10 min, 40 cycles were performed. Each cycle comprised: 30 sec at 95°C, 54°C for 1 min, 
and 30 sec at 72°C. For final elongation, 10 min at 72°C ended the amplification. The 
quantities of the amplification products obtained were determined on 96-well microtiter plates, 
using a fluorometer and Picogreen as intercalant agent (Molecular Probes). 

Example 3 
Identification of Polymorphisms 

a) Identification of Biallelic Markers from Amplified Genomic DNA of Example 2 

The sequencing of the amplified DNA obtained in Example 2 was carried out on ABI 
377 sequencers. The sequences of the amplification products were determined using automated 
dideoxy terminator sequencing reactions with a dye terminator cycle sequencing protocol. The 
products of the sequencing reactions were run on sequencing gels and the sequences were 
determined using gel image analysis (ABI Prism DNA Sequencing Analysis software (2.1.2 
version)). 

The sequence data were further evaluated to detect the presence of biallelic markers 
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within the amplified fragments. The polymorphism search was based on the presence of 
superimposed peaks in the electrophoresis pattern resulting from different bases occurring at the 
same position as described previously. 

The localization of the biailelic markers detected in the fragments of amplification are 
as shown below in Table 6b. 



Table 6b 
Biailelic Markers 



Amplicon 


BM 


Marker 
Name 


Polymor- 
phism 
Alll 1 AI12 


SEQ 
ID No. 


BM 

JL*1TJL 
|/vallIUII 

in SEOID 


Position of 
probes in 
SEO ID No. 


Probe 
No. 


99-27943 


Al 


99-27943-150 


G 


C 


1 


8316 


8304 


8328 


PI 


8-121 


A2 


8-121-28 


A 


T 




14726 


14714 


14738 


P2 


8-121 


A3 


8-121-36 


C 


T 


1 


14734 


14722 


14746 


P3 




A4 


8-121-154 


A 


T 


1 


14852 


14840 


14864 


P4 


8-121 


A5 


8-121-187 


A 


C 


1 


14885 


14873 


14897 


PS 


8-121 


A6 


8-121-243 


G 


T 


1 


14941 


14929 


14953 


P6 


8-121 


A7 i 


8-121-281 


A 


c 




14979 


14967 


14991 


P7 


8-121 


A8 


8-121-352 


C 


T 


1 


15050 


i OUjo 


1 5062 


P8 


8-121 


A9 


8-121-364 


C 


T ; 


1 


15062 


15050 


15074 


P9 


b n 1 


A10 


8-121-371 ; 


A 


G 


1 


15069 


15057 


15081 


P10 


99-27935 


All 


99-27935-193 


G 


C 






21660 


21684 


PI 1 


8-122 


A12 


8-122-72 


A 


T 




25480 


25468 


25492 


P12 


8-122 


A13 


8-122-100 


C 


T 




25508 


25496 


25520 


P13 


8-122 | 


A14 


8-122-271 


deletion of 
CAAA 
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P216 



241752 



241740 



241764 



241861 



241849 



241873 



242402 



242390 



242414 



P217 



P218 



P219 
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8-225 


A220 


8-225-281 


A 


T , 


| 


244313 


244301 


244325 


P220 


99-25940 


A221 


99-25940-186 


A 


G 


1 


247860 


247848 


247872 


P221 


99-25940 


A222 


99-25940-182 


C 


T 


1 


247864 


247852 


247876 


P222 


99-16032 


A223 


99-16032-292 


G 


T 


1 


248315 


248303 


248327 


P223 


99-16055 


A224 


99-16055-216 


A 


G 


! 


253619 


253607 


253631 


P224 


99-16105 


A225 ! 


99-16105-152 


A 


G 


! 

j 1 


255848 


255836 


255860 


P225 


99-16101 


A226 


99-16101-436 


C 


T 




258573 


258561 


258585 


P226 


99-16033 


A227 


99-16033-244 


A 


G 




260099 


260087 


260111 


P227 


99-15875 


A228 


99-15875-165 


C 


T 




279789 


279777 


279801 


P228 


99-13521 


A229 


99-13521-31 


A 


G 




288007 


287995 


288019 


P229 


8-112 


A230 


8-112-241 


C 


T 




292680 


292668 


292692 


P230 


8-112 


A231 


8-112-155 ! 


A 


C 


j 


292766 


292754 


292778 


P231 


8-112 


A232 


8-112-45 


A 


T 


j 


292876 


292864 


292888 


P23? 


8-111 


A233 


8-111-301 


deletion 
AGAT 


j 


295491 


295479 


295503 


r At jj 


8-110 j 


A234 


8-110-404 


G 


C 


j 


295716 


295704 


295728 


P234 


8-1 10 : 


A235 


8-110-89 


A 


G 


j 


29603 1 


296019 


296043 


P235 


8-134 ; 


A236 


8-134-94 


C 


T 


j 


296068 


296056 


296080 


P236 


99-7462 ! 


A237 


99-7462-508 


C 


T 


j 


298969 


298957 


298981 


P237 


99-16052 i 


A238 


99-16052-214 


A 


G 


j 


300365 


300353 


300377 


P238 


99-16047 


A239 


99-16047-115 


A 


G 


1 


312030 \ 


312018 


312042 


P239 


99-25993 


A240 


99-25993-280 


G 


C 




315928 


315916 


315940 


P240 


QQ ^cftm 
yy-25993 


A O A 1 

A241 j 


99-25993-367 


A 


G 


1 


316014 


316002 


316026 


P241 






QO OC1Q1 IC1 

yy-25 iui-151 


A 


G 




317245 


317233 


317257 


P242 


Amplicon 


BJVI 


Marker 
Name 


Polymor- 
phism 


SEQ 
11) INo. 


BM 

position 


Position of 
probes in 
SEQ ID No. 


Probe 

s 


alll 


aII2 


8-94 


A243 


8-94-252 


A 


G 


162 


1501 


1489 


1513 


P243 


8-95 


A244 


8-95-43 


T 


C 


161 


1501 


1489 


1513 


P244 


8-97 


A245 


8-97-98 


G 


A 


160 


1501 


1489 


1513 


P245 


8-98 


A246 


8-98-68 


T 


C 


159 


1501 


1489 


1513 


P246 


99-14021 


A247 


99-14021-108 


A 


G 


151 


1501 


1489 


1513 


P247 


99-14364 


A248 


99-14364-415 


G 


A 


152 


1501 


1489 


1513 


P248 


99-15056 


A249 


99-15056-99 G 


A 


j 115 


1501 


1489 


1513 


P249 
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1 00-1 ^fi£1 
1 00-1 


A250 


99-15063-155 


A 


C 


116 


1501 


1489 


1513 


P250 




i y j i judj 

1 00-1 ^OOQ 


A 1 C 1 

A25 1 


99-15065-85 


C 


G 


117 


1501 


1489 


1513 


P251 




i yy I -)zzy 

1 00 1 ^O'J 1 


A252 


99-15229-412 


T 


C 


157 


1501 


1489 


1513 


P252 




yy-i jzj i 


A253 


99-15231-219 


T 


G 


163 


1501 


1489 


1513 


P253 




yy- j ozdz 


A254 


99-15232-291 


G 


T 


155 


" 1501 


1489 


" 1513 


P254 




yy-i jzjy 


A255 


99-15239-377 


G 


C 


164 


1501 


1489 


1513 


P255 




99-15252 


A256 


99-15252-404 


C 


T 


118 


404 


392 


416 


P256 




99-15253 


A257 


99-15253-382 


C 


T 


119 


1501 


1489 


1513 


P257 




99-15256 


A258 


99-15256-392 


C 


T 


120 


1501 


1489 


1513 


P258 




99-15258 


A259 


99-15258-337 


G 


T 


121 


1501 


1489 


1513 


P259 




99-15261 


A260 


99-15261-202 


A 


G 


122 


1501 


1489 


1513 


P260 




99-15280 


A261 


99-15280-432 


C 


T 


123 


1501 


1489 


1513 


P261 


1 


99-15355 


A262 


99-15355-150 


C 


T 


124 


1501 


1489 


1513 


P262 


r 


99-15663 


A263 


99-15663-298 


G 


A 


175 


1501 


1489 


1513 


P263 


1 


99-15664 


A264 


99-15664-185 


C 


A 


176 


1501 


1489 


1513 


P264 




99-15665 


A265 


99-15665-398 


T 


C 


174 


1501 


1489 


1513 


P265 


lyy-OOOB 


A266 


99-15668-139 


C 


T 


177 


1501 


1489 


1513 


P266 


99-15672 


A267 


99-15672-166 


G 


A 


173 


1501 


1489 


Tsu 


P267 


99-15682 


A268 


99-15682-318 


A 


T 


178 


1501 


1489 


1513 


P268 


99-16081 


A269 


99-16081-217 


C 


T 


113 


330 


318 


342 


P269 


99-16082 


A270 


99-16082-218 


A 


G 


114 


233 


221 


245 


P270 


99-20933 
1 99-20977 


A271 


99-20933-81 




G 


179 


1501 


1489 


1513 


P271 




A272 


99-20977-72 


A 


C 


147 


1501 


1489 


1513 


P272 


99-20978 


A273 


99-20978-89 


C 


G 


148~ 


1501 


1489 


1513 


P273 


99-20981 


A274 


99-20981-300 


A 


G 


149 


1501 


1489 


1513 


P274 


99-20983 


A275 


99-20983-48 




C 


150 


1501 


1489 


1513 


P275 


99-22310 


A276 


99-22310-148 


G 


A 


154 


1501 


1489 


1513 


P276 


99-25029 


A277 


99-25029-241 


G 


A 


180 


1501 


1489 


1513 


P277 


99-25224 


Alio 

A278 


99-25224-189 


A 


G 


125 


1126 


1114 


1138 


P278 


99-25869 


A279 


99-25869-182 


A 


c 


181 


1 501 


I4sy 


1513 


P279 


99-25881 


A280 


99-25881-275 


G 




182 


1501 


1489 


1513 


P280 


99-25897 


A281 < 


99-25897-264 


A 




183 


1501 


1489 


1513 


P281 


99-25906 


A282 ! 


99-25906-131 


G 




184 


1501 j 


1489 


1513 


P282 


99-25917 


A283 ! 


99-25917-115 


G 


A 


185 


,50, | 


1489 


1513 


P283 
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99-25924 


A284 


99-25924-215 


G 


C 


186 


1501 


1489 


1513 


P284 


99-25950 


A285 


99-25950-121 


G 


C 


126 


1501 


1489 


1513 


P285 


99-25961 


A286 


99-25961-376 


T 


G 


127 


1501 


1489 


1513 


P286 


99-25965 j 


A287 


99-25965-399 


T 


C 


128 


1501 


1489 


1513 


P287 


99-25966 


A288 


99-25966-241 


T 


C 


129 


1501 


1489 


1513 


P288 


99-25967 


A289 


99-25967-57 


T 


C 


130 


1501 


1489 


1513 


P289 


99-25969 


A290 


99-25969-200 


C 


A 


131 


i50i 


1489 


1513 


P290 


99-25972 


A291 


99-25972-317 


G 


A 


132 


1501 


1489 


1513 


P291 


99-25974 


A292 


99-25974-143 


T 


C 


133 


1501 


1489 


1513 


P292 


99-25977 


A293 


99-25977-311 


A 


G 


134 


1501 


1489 


1513 


P293 


99-25978 


A294 


99-25978-166 


T 


C 


135 


1501 


1489 


1513 


P294 


99-25979 


A295 


99-25979-93 


A 


G 


136 


1501 


1489 


1513 


P295 


99-25980 


A296 


99-25980-173 


A 


T 


137 


1501 


1489 


1513 


P296 


99-25984 


A297 | 


99-25984-312 


G 


A 


138 


1501 


1489 


1513 


P297 


99-25985 


A298 


99-25985-194 


c 


T 


139 


1501 


1489 


1513 


P298 


99-25989 


A299 


99-25989-398 


T 


c 


140 


1501 


1489 


1513 


P299 


99-26126 


A300 


99-26126-498 


A 


G 


165 


1501 


1489 


1513 


P300 

M -J \J\J 


99-26138 


A301 


99-26138-193 


C 


T 


187 


1501 


1489 


1513 


P301 


99-26146 


A302 


99-26146-264 


c 


A 


188 


1501 


1489 


1513 


P302 


99-26147 


A303 


99-26147-396 


G 


A 


141 


1501 


1489 


1513 


P303 


99-26150 


A304 ! 


99-26150-276 


T 


C 


142 


1501 


1489 


1513 


P304 


99-26153 


A305 


99-26153-44 


A 


C 


143 


1501 


1489 ! 


1513 


P305 


99-26154 


A306 


99-26154-107 


G 


T 


144 


1501 


1489 


1513 


P306 


99-26156 


A307 


99-26156-290 


A 


C 


145 


1501 


1489 


1513 


P307 


99-26166 


A308 


99-26166-257 


G 


A 


166 


1501 


1489 


1513 


P308 


99-26167 


A309 


99-26167-278 


T 


C 


167 


1501 


1489 


1513 


P309 


99-26169 


A310 


99-26169-211 


T 


C 


168 


1501 


1489 


1513 


P310 


99-26171 


A311 


99-26171-71 


A 


G 


169 


1501 


1489 


1513 


P311 


99-26183 


A312 


99-26183-156 


C 


T 


170 


1501 


1489 


1513 


P312 


99-26189 


A313 


99-26189-164 


C 


A 


189 


1501 


1489 


1513 


P313 


99-26190 


A314 


99-26190-20 


c 


A 


190 


1501 


1489 


1513 


P314 


99-26191 


A315 


99-26191-58 


G 


A 


191 


1501 


1489 


1513 


P315 


99-26201 


A316 


99-26201-267 


C 


G 


192 


1501 


1489 


1513 


P316 


99-26222 


A317 


99-26222-149 


A 


G 


193 


1501 


1489 


1513 


P317 
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99-27349 


A352 


99-27349-267 


G 


A 


222 


1501 


1489 


1513 


P352 


99-27352 


A353 


99-27352-197 


C 


G 


223 


1501 


1489 


1513 


P353 


99-27353 


A354 


99-27353-105 


T 


C 


224 


1501 


1489 


1513 


P354 


99-27360 


A355 


99-27360-142 


G 


T 


225 


1501 


1489 


1513 


P355 


99-27361 


A356 


99-27361-181 


A 


G 


226 


1501 


1489 


1513 


P356 


99-27365 


A357 


99-27365-421 


C 


T 


227 


1501 


1489 


1513 


P357 


99-27680 


A358 


99-27680-484 


G 


T 


228 


484 


472 


496 


P358 


99-27912 


A359 


99-27912-272 


C 


T 


229 


1501 


1489 


1513 


P359 


99-30329 


A360 


99-30329-380 


C 


T 


112 


380 


368 


392 


P360 



Certain biallelic markers of the invention are insertions or deletions, as indicated above. 
In particular, the deletion of the nucleotides AGAT (A223, biallelic marker 8-1 1 1-301) in Table 
6b above may comprise a single deletion of the AGAT motif, or deletions of two or more 
AGAT motifs. This marker (A223) may thus also serve as a microsatellite marker. 

BM refers to "biallelic marker". Alll and a!12 refer respectively to allele 1 and allele 2 
of the biallelic marker. 

b) Identification of Polymorphisms by Comparison of Genomic DNA from 
Overlapping BACs 

Genomic DNA from multiple BACs derived from the same DNA donor sample and 
overlapping in regions of genomic DNA of SEQ ID No. 1 was sequenced. Sequencing was 
carried out on AB1 377 sequencers. The sequences of the amplification products were 
determined using automated dideoxy terminator sequencing reactions with a dye terminator 
cycle sequencing protocol. The products of the sequencing reactions were run on sequencing 
gels and the sequences were determined using gel image analysis (ABI Prism DNA Sequencing 
Analysis software (2.1.2 version)). 

The sequence data from the overlapping regions of SEQ ID No. 1 were evaluated to 
detect the presence of sequence polymorphisms. The comparison of sequences identified 
sequence polymorphisms including single nucleotide substitutions and deletions, and multiple 
nucleotide deletions. The localization of these polymorphisms within SEQ ID No. 1 is shown 
below in Table 6c. 

Table 6c 



Polymorphisms 



Ref. No. 


Polymorphism 
type 


Allele 1 


Allele 2 


Position in SEQ ID 
No. 1 


A361 


Deletion 


AAGG 




61595 to 61598 


A362 


Deletion 


ATTTT 




75217 to 75221 
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A406 


Polymorphic base 


A 


c 


148372 


A407 


Polymorphic base 


A 


G 


149012 


A408 


Polymorphic base 


C 


T 


1491 13 


A409 


Polymorphic base 


A 


G 


151637 


A410 


Deletion 


G 




151748 


A41 1 


Polymorphic base 


A 


G 


1 5 1 769 


A412 


Polymorphic base 


C 


T 


1 S 1 847 


A413 


Polymorphic base 


A 


C i 


X J£.\JZ? 1 


A414 


Polvmornhic hase 

1 VI T lllvl pill V l/UJv 


A 


VJ 


i /OO 


A4 1 5 


Polvmnrnh »<? ha<if» 


c 


T 




A416 


Pnlvtnornhip haQf» 


A 


VJ 


1 <1 1 11 1 


A417 


Prvlvmnmhif q 


c 


T 
1 


153;/Z5 


A418 


PnlvmnrnHif* Knc^ 

r \iljr fJUIls UuoC 


VJ 


1 


i53y / / 


A419 


Polvmnrnh ir* Koco 
i vljf II1UI r UabC 




1 


1 54502 




roiymorpnic oase 


A 
f\ 


r> 
VJ 


1 54677 ! 


Ad9 1 


roiymorpnic oase 




TP 

T 


1 54879 


Ad9? 


roiymorpnic oase 


n 
\j 


TP 

T 


154918 


A/19'? 


Polymorphic base 




T 


155802 


A did 


roiymorpnic oase 


A 


G 


156448 


Ad9S 


roiymorpnic oase 


A 

A 


C 


157238 


Ad1& 


Polymorphic base 


A 


G 


157897 


Ad77 


Polymorphic base 


A 

A 


G 


158172 


Adin 


Polymorphic base 


A 

A 


G 


158302 


A /I70 


Deletion 


TT 




158510 to 158511 


A A1C\ 


Polymorphic base 


c 


T 


158803 ! 


Adl 1 


Polymorphic base 




T 


160172 


Ad^O 
AhJZ 


Polymorphic base 




T 


160634 


A*t J J 


Polymorphic base 




T 


161236 


Ad^d 


roiymorpnic oase 


A 

A 


G 


162810 


Adl^ 


roiymorpnic oase 


A 

A 


O 


1 63007 


nHJO 


roiymorpnic oase 


A 
A 


o 


164877 




roiymorpnic oase 




TP 
1 


1 66844 




ueieuon 


1L1L 




16691 1 to 166914 


A430 


roiymorpnic oase 


A 

A 


VJ 


167754 i 


A440 


ruijrii \\Ji piiiv Uaov 


r* 


T 
1 


Io77o7 


A441 


r \jiy iiiui jJii ic. uaac 


VJ 


1 


1 6/694 


A442 


Polvmornhip H^q^ 


c 


T 
1 


Ioo34o 


A443 


Polvmomhip haQ^ 


A 
/A 


VJ 


1 /CO/1 1 A \ 


A444 


Polvmornhic ha<?e 

■ V/IJr IUU1 i^lliv UujC 


A 

rV 


c 


1 £Q/1C1 

1 Oo4D3 


A445 


Polvmornhic ha^p 


A 


vJ 


1093UU 


A446 


Polymorphic base 


c 


T 
i 




A447 


Polymorphic base 


A 


G 


1 070^ / 


A448 


Polymorphic base 


c 


T 




A449 


Polymorphic base 


c 


T 




A450 


Polymorphic base 


c 


T 


170746 


A451 


Polymorphic base 


G 


T 


170858 


A452 


Polymorphic base 


c 


T 


170860 


A453 


Polymorphic base 


C 


T 


170906 


A454 


Polymorphic base 


A 


G 


171309 ] 


A455 


Polymorphic base 


A 


G 


171413 


A456 


Polymorphic base 


C 


T 


171504 


A457 


Polymorphic base 


C 


T 


171539 


A458 


Polymorphic base 


C 


T 


171728 


A459 


Polymorphic base 


A 


G 


171898 


A460 


Deletion 


AA 




172125 to 172126 
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Example 4 

Validation Of The Polymorphisms Through Microsequencing 

The biallelic markers identified in Example 3a were further confirmed and their 
respective frequencies were determined through microsequencing. Microsequencing was 
carried out for each individual DNA sample described in Example 1 . 

Amplification from genomic DNA of individuals was performed by PCR as described 
above for the detection of the biallelic markers with the same set of PCR primers (Table 6a). 

The preferred primers used in microsequencing were about 19 nucleotides in length and 
hybridized just upstream of the considered polymorphic base. According to the invention, the 
primers used in microsequencing are detailed in Table 6d. 

Table 6d 



Marker Name 


Biallelic 
Marker 


SEQ 
ID No. 


Mis. 1 


Position range of 
microsequencing 
primer mis. 1 in 
SEQ ID No. 


Mis. 2 


Complementary 
position range off 
microsequencing 
primer mis. 2 in 
SEQ ID No. i 


99-27943-150 


Al " 




Dl 


8297 8315 


El 


8317 8335 
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8-121-28 


A2 




D2 


14707 


14725 


E2 


4727 


14745 


8-121-36 


A3 


j 


D3 


14715 


14733 


E3 


14735 


14753 


8-121-154 


A4 


— i — 


D4 


14833 


14851 


E4 


4853 


14871 


8-121-187 


A5 


— j — 


D5 


14866 


14884 


E5 


4886 


14904 


8-121-243 


A6 


— j — 


D6 


14922 


14940 


E6 


14942 


14960 


8-121-281 


A7 


— j — 


D7 


14960 


14978 


E7 


14980 


14008 


8-121-352 


A8 


— j — 


D8 


15031 


15049 


E8 


5051 

1 J J 1 


1 5060 


8-121-364 


A9 


— j — 
— - — 


D9 


15043 


1 5061 


F9 


1 5061 


I^ORl 
1 DUO 1 


8-121-371 


A1 0 

/A. 1 \J 




mo 


1 soso 


1 S068 


Pin 


1 <.070 
1 DU /U 


1 ^HRQ 


09-970TS-10'* 

1 7JJ- 1 Jfj 


A 1 1 
r\ 1 1 






9 1 6^1 

Z I Ojj 


91671 I 
Z 1 o / 1 


PI 1 

LI 1 


9 1 £91 
Z 10 / J 


zioy i 


R-1 99-79 
o- izz- /z 


A 1 n 
Alz 




ni 9 

U I z 


ZD^fOl 


zj4 /y 


P 1 9 l 
b 1 Z 


ZD4ol j 


25499 


R-1 99.1 on 


A 1 "X 




ni i 


ZJ407 


ZD jU / 


P 1 1 

b 1 J | 


zDDUy 


25527 


8 199 991 






T\1 A 
U 1 4 


ZJOOU 


OCA9R 
ZDO /o 


IT 1 /I 

bl4 


25680 


25698 


R-199 979 
o- 1 ZZ-Z f Z 


A 1 < 




Did 


ZDOOl 


zdo /y 


IT 1 C 
blD 


ZDOO 1 


25699 


R-199 19A 


A 1 A 1 
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295717 


295735 


8-110-89 


A235 


1 


D235 


296012 


296030 


E235 


296032 


296050 


8-134-94 


A236 




D236 


296049 


296067 


E236 


296069 


296087 


99-7462-508 


A237 




D237 


298950 


298968 


E237 


298970 


298988 


99-16052-214 


A238 




D238 


300346 


300364 


E238 


300366 


300384 


99-16047-115 


A239 




D239 


312011 


312029 


E239 


312031 


312049 
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Biallelic 
Marker 



SEQ1 
ID No. 



Mis. 1 



Position range of 
microsequencing 
primer mis. 1 in 
SEQ ID No. 



Mis. 2 















I 8-95-43 


A243 


162 


D243 


1482 


1500* 




1502 


[ 1521 




A244 


161 


D244 


1481 


1500 


Jb244 


1502 


1520* 


1 8-97-98 


A245 


1 160 


D245 


1482 


1500* 


h245 


1502 


1521 


1 8-98-68 
199-14021-108 


A246 


1 159 


D246 


1481 


1500 


Jb246 


1502 


1520* 




A247 


1 151 


D247 


1482 


1500* 


E247 


1502 


1521 


(99-14364-415 


A248 


1 152 


D248 


1482 


1500* 


E248 


1502 


1521 


1 99-15056-99 


A249 


115 


D249 


1482 


1500* 


CO A C\ 

c.249 


1502 


1521 


1 99-15063-155 

1 AD 1 CA/f f o /- 


A250 


116 


D250 


1482 


1500* 


fc,250 


1502 


1521 


1 V9-1 5065-85 

1 QO 1 CT>A J i n 


A251 


117 


D251 


1481 


1500 


£i2!> 1 


1502 


1520* 


1 W- 15229-4 12 
1 0O_ 1 <01 1 lin 


A252 


I 157 


D252 


1481 


1500 


CZ32 


1502 


1520* 


l J i -2 1 9 
1 99-152^^-901 


A253 


163 


D253 


1481 


1500 




1502 


1520* 


1 99-15239-377 


A254 


155 


D254 


1481 


1500 


E254 


1502 


1520* 




A255 j 


164 


D255 


1482 


1500* 


E255 


1502 


1521 


199-15252-404 




118 


D256 


384 


403 


E256 


405 


423* 


199-15253-382 


/ | 


119 


D257 


1481 


1500 


E257 


1502 


1520* 


(99-15256-392 


A258 1 
A259 1 


120 


D258 


1 A O 1 

1481 


1500 


E258 


1502 


1520* 


99-15258-337 




121 


D259 


1481 


1500 


E259 


1502 


1520* 


99-15261-202 


A260 


122 


D260 


482 


1500* 


E260 


1502 


1521 


99-15280-432 


A261 


123 


D261 


1481 


1500 


E261 


1502 


1520* 


99-15355-150 j 


A262 


124 


D262 


1482 


1500* 


E262 


1502 


1521 


99-15663-298 | 


A263 


175 


D263 


1482 


1500* 


E263 


1502 


1521 


99-15664-185 


A264 | 


176 


D264 


1482 


1500* 


E264 


1502 


1521 


99-15665-398 j 


A265 


174 


D265 


1481 


1500 


E265 


1502 


1520* 


99-15668-139 ( 


A266 


177 


D266 


1482 


1500* 


E266 


1502 


1521 


99-15672-166 j 


A267 


173 


D267 


1482 


1500* 


E267 


1502 


1521 


99-15682-318 


A268 


178 


D268 


1482 


1500* 


E268 


1502 


1521 


99-16081-217 


A269 


113 


D269 


310 


329 


E269 


331 


349* 



Complementary 
position range of 
microsequencing 
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99-16082-218 


A270 


114 


D270 


214 


232* 


E270 


234 


253 i 


99-20933-81 


A271 


179 


D271 


1481 


1500 


E271 


1502 


1520* 


99-20977-72 


A272 


147 


D272 


1482 


1500* 


E272 


1502 


1521 


99-20978-89 


A273 


148 


D273 


1481 


1500 


E273 


1502 


1520* 


99-20981-300 


A274 


149 


D274 


1481 


1500 


E274 


1502 


1520* 


99-20983-48 


A275 


150 


D275 


1482 


1500* 


E275 


1502 


1 521 
i j^i 


99-22310-148 


A276 


154 


D276 


1481 


1500 


E276 


1 509 


1 S90* 

1 JZU 


99-25029-241 


A277 


180 




1482 


1 500* 


F277 


1 509 
I JUZ 


1 ^9 1 
1 jZ 1 


99-25224- 1 89 


A27R 


■ 


D97R 


1 1 07 
I J u / 


1 19^* 

1 1 ZJ 


F97R 
cz / o 


1 1 97 
1 IZ / 


1 1 /1< 
1 l4t> 


99-25869-182 


A27Q 


181 ! 

lOl 


UZ /7 




1 SOO* 


T7970 

JE/Z I 7 


1 ^09 
1 DUZ 


1 C9 1 \ 
1 JZ 1 


99-25881-275 


A9Rfl 


1 89 
J oZ 


FI9 80. 
UZoU 


1 AQ 1 
Ho I 


1 jUU 


C98A 


1 CA') 
1 jUz 


i con* 

1 520* 


99-25897-264 


A9R1 


1 5 J 


UZ5 1 


1 A 89 

J4oZ 


1 jU0* 


"C98 1 


1 CAO 

1 502 


1 CO 1 

1521 


99-25906-131 

77 i> J7UU 1 J J 


A9R9 


1 Rzl 
1 54 


TV) 89 


1 /IB 1 
148 1 


1 5UU 


C9 89 

czoz 


1 coo 
1 502 


1 coa* 

1520* 


99-25917-1 15 

77 AJ71 / I 1J 


A9R** 


1 RC. 


1^9 8^ 
JJzoJ 


1/18 1 

14ol 


J 5UU 


fcrZoJ 


1 CAO 

J 502 


1520* 


09-25924-2 1 5 


A9RA 


1 8£ 


r\9 8A 
UzoH 


1 /t89 

J4oZ 


1 500^ 


bZ©4 


1 502 


1 CO 1 

1 521 


QQ-9SQSO-191 

77"ZJ7JV"Iil j 




1 9£ 
1 ZO 


I"Y9Q< 

Dzoj 


1 /I 89 ! 

I45Z 


1500* 


b285 


1 C AO 

1502 


\ CO 1 

1521 


99-25961-376 

77 LJ7U 1 —J /U 


A9RA 
AZoO 


1 Z / 


UZoO 


1/181 

1451 


1 CAA 

1 500 


COQ£ 

^ZoO 


1 502 


1 cor** 

1520* 


99-25965-399 

77 6J7UJ J 77 


A9R7 < 

AZO / 


198 
IZO 


UZo / 


1 AQ 1 


1 Dull 


T79 87 
EZo / 


1 CAO 

1 502 


"1 COA* 

1520* 


99-25966-24 1 


A9RR 

AZOO 


1 9Q 

■ zy 


F19 8R 
UZoo 


1/181 
145 1 


1 jUU 


G988 

czoo 


1 CAO 

1 502 


1 co/\* 

1 520* 


99-25967-57 


A28Q 


1 30 


HO RQ 
XJx. 07 


1 48 1 


i ^nn 

1 jUU 


P9RO 
HZ07 


1 ^f!9 
1 jUZ 


1 COA* 

1520* 


99-25969-200 


A290 


131 

1 J x 


H9QO 


14R9 




HZ7U 


1 ^A9 
1 jUZ 


1 CO 1 

15Z1 


99-25972-317 


A291 


132 


D9Q1 


14R9 


i son* 


LZ7 1 


1 ^H9 


1 C9 1 

1 jZ 1 


99-25974-143 


A292 


133 


D2Q2 

17^7Z. 


1481 

1 to 1 


1 500 


F909 


1 509 
1 JUZ 


1 ^9A* 
1 JZU 


99-25977-3 1 1 


A293 


134 


D293 

17il< 7_7 


1482 


1500* 


F203 


1 502 


1 ^9 1 
1 JZ 1 


99-25978-166 


A294 


135 


D294 


1481 


1500 


E294 

1 1 » fa 7 T 


1502 


1 S90* 

1 JZU 


99-25979-93 


A295 


136 


D295 


1482 


1500* 


E295 


1502 


1 521 


99-25980-173 


A296 


137 


D296 


1482 


1500* 


E296 


1502 


1521 


99-25984-312 


A297 


138 


D297 


1482 


1500* 


E297 


1502 


1521 


99-25985-194 


A298 


139 


D298 


1481 


1500 


E298 


1502 


1520* 


99-25989-398 


A299 


140 


D299 


1481 


1500 


E299 


1502 


1520* 


99-26126-498 


A300 


165 


D300 


1482 


1500* 


E300 


1502 


1521 


99-26138-193 


A301 


187 


D301 


1481 


1500 


E301 


1502 


1520* 


99-26146-264 


A3 02 


188 


D302 


1482 


1500* 


E302 


1502 


1521 


99-26147-396 


A303 


141 


D303 


1482 


1500* 


E303 


1502 


1521 
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1 99-261 Sfi-976 


1 A r\ a 




\ ^ y *-UI Jv'^/O 

1 99-261 *s~k,AA 


1 A3 04 


142 


D304 


1 1481 


1500 


IE304 


1 ^no 


1520* 


I 99-261 *>d 


1 A3 05 


143 


D305 


1482 


1500* 


1 E305 


1 C.AO 


1521 


I 1 JH~ I U / 


1 A3 06 


1 144 


1 D306 


1481 


1500 


1 F^ft6 


1 502 


1520* I 


I QO oco 


1 A307 


I 145 


D307 


1482 


1500* 


1 F^A7 


1 CAA 

1502 


1521 j 


1 77-zoloo-257 

1 OO 1#£1/C7 TJO 


A308 


~| 166" 


1 D308 


1481 


1500 


1 Pine 


1502 


1520* j 


I ""-aO I o /-27o 


J A309 


1 167 


D309 


1482 


i son* 


1 UTAH 

1 bJU9 


1502 


1521 j 


1 99-26 1 69-2 1 3 

loo O^ni «r t 


A3 10 
1 A311 


1 1<58~ 


1 D310 


1 1482 


i sno* 


I CI 1 A 


1502 


1521 1 


1 99-261 71-71 
199-26183-156 




1 169 


1 D311 


1 1481 


l son 


1 P1 1 i 


1502 


1520* 1 


199-26189-164 


J A3 12 
1 A3 13 


1 170 


1 D312 


1 148"? 


1 c;aa* 


IE312 


1502 


1521 




1 A3 14 


1 189~ 


1 D313 


I MR? 




I E313 


1502 


1521 


I 99-26 1 90-20 
I 99-26 1 91 -58 




190 


1 D314 


1 1482 




E314 


1502 


1521 


1 99-26201-267 


A3 15 


I 191 


1 D315 


J 1481 




E315 


1502 


1520* 1 


99-26222-149 " 
I 99-26223-225 


A316 


f 192 


1 niifi 


1 1 A Q 1 
l4ol 


1500 


E316 


1502 


1520*j 


A3 17 


193 


D"? 1 7 


14ol 


1500 


E317 


1502 


1520* 1 


(99-26225-148 


A318 


194 


D3 1 R 


1451 


1500 


E318 


1502 


1520*~| 


199-26228-172 
I 99-26233-275 


A3 19 


195 1 


D^l 0 


1 Aft 1 
J 45 1 


1500 j 


E319 


1502 


1520* 


A320 


196 


*s .j \j i 


I45<<£ 


1500* ! 


E320 


1502 


1521 


1 99-26234-336 


A321 


197 


D321 




1500* 


E321 


1502 


1521 1 




A322 


198 


D322 


1 46 1 


1500 j 


E322 


1502 


1520* 


199-26238-186 


A323 j 


199 


D323 


J to 1 


i ^aa f 


E323 


1502 


1520* 


199-5873-159 I 
loo cnn Ar\ t 


A324 J 


146 


D324 | 


1481 


1 son 1 


hi 24 


1502 


1520* 


l^voy 12-49 I 


A325 


171 1 


D325 1 


1481 


1 <AA 1 
1 jUU 1 


E325 


1502 


1520* 


I 99-60 1 2-220 j 

I OO iCAOA r»r» I 


A326 


158 


D326 


1481 


i son ! 

1 JUU i 


fcJ26 


1502 


1520*1 


i yy-o080-99 l 

I QO 7Qno I C7 t 


A327 


156 


D327 


1481 


1 500 1 


E"3 0*7 


1502 


1520* 


I yy- /JUo-157 I 


A328 


153 J 


D328 I 


1482 


1500* i 


FT9B 


1502 


1521 | 


1 77" /-ZU4 I 

1 QO 1 Ai n< To 1 


A329 


172 


D329 I 


1482 


1500* ! 




1502 


1521 1 


1 77" 1 O I UO-4 O J 

1 90-9*.^ ^7 IOC 1 


A330 f 


200 


D330 


59 


78 f 


fv5a 


OA 

oO 


99 


1 77-ZJJJ/-1/3 I 

I 99-7SS 1 ^ mo 1 


A33 1 j 


201 


D331 


105 


124 


FT*1 I 


Mb 


145 


1 ''7- < 6JJ 1 D-jU / j 

99-26173-470 J 


A332 


202 


D332 | 


286 


305 f 


E332 


307 






A It 1 


203 ] 


D333 j 


1481 


1500 1 


E333 


1502 


1521 


99-26267-524 i 


A334 


204 


D334 


1481 


1500 ] 


E334 


1502 


1521 | 


99-26284-394 | 


A.335 


205 


D335 P 


1481 


1500 j 


E335 


1502 


1521 1 


99-26559-315 


A.336 


206 


D336 


1481 


1500 j] 


E336 


1502 


1521 


99-26769-256 F 


4.337 


207 


D337 


1481 


1500 ~T] 


E337 


1502 


1521 
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99-26772-268 


A338 


208 


D338 


1481 


1500 


E338 


1502 


1520* 


99-26776-^09 


A339 


209 I 


D339 


1481 


1500 


E339 


1502 


1521 


99-26779-437 


A340 


210 


D340 


1477 


1496 


E340 


1498 


1517 

1 <J 1 t 


99-26781-25 


A341 


211 


D341 


1482 


1500* 


E341 


1502 


1521 


99-26782-300 


A3 42 


212 


D342 


1482 


1500* 


E342 


1502 ! 


1 S? 1 


99-26783-81 


A343 


213 


D143 


1481 


1500 


F141 


1 S09 


1 S7 1 


99-26787-96 


A ^44 

rvj't'T 








1 soo* i 


F144 


1 S09 


1 S7 1 
1 !>2 1 


99-26789-20 1 


A14S 


9 1 S 




14R9 


1 son* 


P14S 


1 sno 

1 302 


1 CO 1 

1 521 


99-27297-280 


A 1/16 


916 




1 ztft 1 

1 HO 1 


1 son 




1 sno 
1502 


1 CO 1 

1521 


77 * / J VvJ" J V/O 


A 1/17 


9 1 7 
Z 1 / 


JJ34 / 


1 Aft 1 
I4o 1 


1 son 


CI /I 7 

b34 / 


1 CAO 

1502 


1 CO 1 

1521 


QQ-97T 19-SR 


A 1/1 ft 
AJ45 


2 1 o 


r\i/iQ 
Lo4o 


1 A O 1 

14ol 


1 500 


b348 


1 C 

1502 


1521 






Tin 

2 iy 


UJ49 


1,-101 

14ol 


1 CAA 

1 500 


E349 


1502 


1521 


00 9711^ 1 Ol 


AJ50 


220 


T\1 CA 

D350 


1 A O 1 

1481 


1 CAA 


E350 


1502 


1521 


00 9714^-1 RQ 

77-Z /jfj* 1 07 


A TCI 


OO 1 

221 


D351 


1481 


1 CAA 

1500 


E351 


1502 


1521 


00-9714O 9A7 


A O CO 


222 


D352 


1482 


1 CAA* 

1 500* 


E352 


1502 


1521 


QQ-97TS9 107 


A 1 C2 


OOI 

223 


T\0. CI 

D353 


1 A O 1 

1481 


1 CAA 

1 500 


E353 


1502 


1520* 


00-971 si ins 


A 1 C/l 


71 A 

224 


D3 54 


1 A QO 

1482 


1 CAA* 

1500* 


"CO C >l 

E354 


1502 


1521 


00-97160-149 


A1CC 

AJZ> j 


223 


T"\1 CC 


1 /i 00 
14o2 


1 CAA* 

1 500* 


T^O C 

E355 


1502 


1521 


99-27361-181 


A3 56 


226 


D356 


1482 


1 500* 




1 S09 


1 so 1 
1 _>z 1 


99-27365-421 


A357 


227 


D357 


1482 


1500* 


E357 


1502 


1521 


99-27680-484 


A358 


228 


D358 


464 


483 


E358 


485 


504 


99-27912-272 


A3 59 


229 


D359 


1481 


1500 


E359 


1502 


1521 


99-30329-380 


A360 


112 


D360 


361 


379 


E360 


381 


399 



Mis 1 and Mis 2 respectively refer to microsequencing primers which hybridized with 
the coding strand or with the non-coding strand of the nuceotide sequences of the invention. 

The microsequencing reaction was performed as follows : 
5 After purification of the amplification products, the microsequencing reaction mixture 

was prepared by adding, in a 20^1 final volume: 10 pmol microsequencing oligonucleotide, 1 U 
Thermosequenase (Amersham E79000G), 1.25 \x\ Thermosequenase buffer (260 mM Tris HC1 
pH 9.5, 65 mM MgCl 2 ), and the two appropriate fluorescent ddNTPs (Perkin Elmer, Dye 
Terminator Set 401095) complementary to the nucleotides at the polymorphic site of each 
10 biallelic marker tested, following the manufacturer's recommendations. After 4 minutes at 

94°C, 20 PCR cycles of 1 5 sec at 55°C, 5 sec at 72°C, and 10 sec at 94°C were carried out in a 
Tetrad PTC-225 thermocycler (MJ Research). The unincorporated dye terminators were then 
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removed by ethanol precipitation. Samples were finally resuspended in formamide-EDTA 
load.ng buffer and heated for 2 min at 95°C before being loaded on a polyacrylamide 
sequencing gel. The data were collected by an ABI PRISM 377 DNA sequencer and processed 
using the GENESCAN software (Perkin Elmer). 

Following gel analysis, data were automatically processed with software that allows the 
determmation of the alleles of biallelic markers present in each amplified fragment 

The software evaluates such factors as whether the intensities of the signals resulting 
from the above microsequencing procedures are weak, normal, or saturated, or whether the 
s.gn.1, are ambiguous. In addition, the software identifies significant peaks (according to shape 
and he.ght criteria). Among the significant peaks, peaks corresponding to the targeted site are 
identified based on their position.. When two significant peaks are detected for the same 
po.rt.on. each sample is categorized classification as homozygous or heterozygous type based 
on the height ratio. 



Example 5a 

Association Study Between Schizophrenia And The Biallelic Markers Of The Invention: 

Collection Of DNA Samples From Affected And Non-Affected Individuals 
A) Affecte d population 

AH the samples were collected from a large epidemiological study of schizophrenia 
undertaken in hospital centers ofQuebec from October 1995 to April .997. The population was 
composed of French Caucasian individuals. The study design consisted in the ascertainment of 
cases and two of their first degree relatives (parents or siblings). 

As a whole, 956 schizophrenic cases were ascertained according to the follow™ 
inclusion criteria: 

- the diagnosis had been done by a psychiatrist; 

- the diagnosis had been done at least 3 years before recruitment time, in order to 
exclude individuals suffering from transient manic-depressive psychosis or depressive 
disorders; 

- the patient ancestors had been living in Quebec for at least 6 generations; 

- it was possible to get a blood sample from 2 close relatives. 

Among the 956 schizophrenic ascertained cases, 834 individuals were included in the 
study for the following reasons: 

- for the included individual cases, the diagnosis of schizophrenia was established 
according to the DSM-IV (Diagnostic and Statistical Manual, Fourth edition, Revised 1994, 
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American Psychiatric Press); 

- samples from individuals suffering from schizoaffective disorder were discarded; 

- individuals suffering from catatonic schizophrenia were also excluded from the 
population of schizophrenic cases; 

5 - were also excluded the individuals having a first degree relative or 2 or more second 

degree relatives suffering from depression or mood disorder; 

- individuals having had severe head trauma, severe obstretica! complications, 
encephalitis, or meningitis before onset of symptoms were also excluded; 

- has also been excluded from the population of schizophrenic cases a patient suffering 
10 from epilepsy and treated with anticonvulsants. 

The age at onset was not added as an inclusion criteria. 
B) Unaffected population 

Control cases were respectively ascertained based on the following cumulative criteria: 

- the individual must not be affected by schizophrenia or any other psychiatric disorder; 
15 - the individual must have 35 years old or more; 

- the individual must belong to the French-Canadian population; 

- the individual must have one or two first degree relative available for blood sampling. 
Controls were matched with cases sex when possible. 

O Cases and Control Populations Selected for the Association Study 
20 The unaffected population retained for the study was composed of 241 individuals. The 

initial sample of the clinical study was composed of 21 5 cases and 214 controls. The controls 
were composed of 1 16 males and 98 females while the cases were composed of 1 54 males and 
64 females. For each control, two first degree relatives (father, mother, sisters and brothers) 
were available. In order to match the sex of cases and controls, the parents of female controls 
25 were substituted for the female controls where possible and where the parents were known to be 

unaffected by schizophrenia or other psychosis. The parents of 27 female controls were thus 
substituted for the respective females, resulting in a total control sample size of 241 individuals. 
The composition of the control sample is detailed below in Table 7. 



Table 7 




Description of control samples 




Probands 


187 


Male 


116 


Female 


71 


Parents of probands 


54 


Fathers 


27 
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Mothers 


27 


Total 


r 24i 







The association data that are presented below were obtained on a population size 
detailed in Table 8 below, wherein the individuals have been randomly selected from the 



populations detailed above. 



Table 8 




close relatives (first or second degree) 



Both case and control populations form two groups, each group consisting of unrelated 
md.v.dua.s that do not share a known common ancestor. Additionally, the individuals of the 
control population were' selected among those having no family history of schizophrenia or 
schizophrenic disorder. 

Genotyping of affected and control individuals 
A) Results from the genotvpinp 

The general strategy to perform the association studies was to individually scan the 
DNA samples from all individuals in each of the populations described above in order to 
establish the allele frequencies of bia.lelic markers, and among them the bia.le.ic markers of the 
.nvent,on, in the diploid genome of the tested individuals belonging to each of these 
populations. 

Allelic frequencies of every bia.le.ic marker in each population (cases and controls) 
were determmed by performing microseauencing reactions on amplified fragments obtained by 
genom.c PGR performed on the DNA samples from each individual. Genomic PGR and 
m.crosequencing were performed as detai.ed above in Examples . to 3 using the described PGR 
and microsequencing primers. 

Single biallelic snarkeir frequency analysis 
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For each allele of the biallelic markers included in this study, the difference between the 
allelic frequency in the unaffected population and in the population affected by schizophrenia 
was calculated and the absolute value of the difference was determined. The more the 
difference in allelic frequency for a particular biallelic marker or a particular set of biallelic 
5 markers, the more probable an association between the genomic region harboring this particular 

biallelic marker or set of biallelic markers and schizophrenia. Allelic frequencies were also 
useful to check that the markers used in the haplotype studies meet the Hardy-w einberg 
proportions (random mating). 

The allelic frequencies of biallelic markers in the chromosome 13q31-q33 region 
10 between the affected and the unaffected population, using the sample population described 

above, is set forth in Table 9. 

Table 9 



Allelic frequencies of markers in different sub-samples 



marker 


alleles 


all sample 






cases | 


controls 






all 


HF+ 


HF- 




99-20978/89 


C/G 


0.51 


0.47 


0.51 


0.55 


99-20983/48 


A/G 


0.30 


0.28 


0.33 


0.29 


99-20981/300 


A/G 


0.54 


0.51 


0.55 


0.56 


99-20977/72 


A/C 


0.40 


0.41 


0.38 


0.35 


99-6080/99 


C/T 


0.58 


0.57 


0.57 


0.55 


99-15229/412 


A/G 


0.54 


0.52 


0.55 


0.53 


99-22310/148 


C/T 


0.46 


0.48 


0.44 


0.47 


99-15232/291 


C/T 


0.46 


0.48 


0.43 


0.47 


99-14021/108 


A/G i 


0.46 


0.48 


0.44 


0.47 


8-98/68 


A/G 


0.20 


0.18 


0.23 


0.19 


8-97/98 


C/T 


0.78 


0.75 


0.81 


0.80 


99-6012/220 


C/T 


0.20 


0.19 


0.23 


0.19 


8-95/43 


A/G 


0.18 


0.20 


0.18 


0.21 


99-7308/157 


C/T 


0.39 


0.42 


0.36 


0.39 


99-14364/415 


C/T 


0.38 


0.40 


0.36 


0.39 


99-15672/166 


C/T 


0.51 


0.47 


0.54 


0.54 


99-15668/139 


C/T 


0.58 


0.56 


0.62 


0.65 


99-15665/398 


A/G 


0.72 


0.67 


0.72 


0.76 


99-15663/298 


C/T 


0.72 


0.67 


0.72 


0.76 


99-15664/185 


C/T 


0.69 


0.62 


0.72 


0.72 
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In the association study described herein, several individual biallelic markers were 
shown to be significantly associated with schizophrenia. In particular, several of the 
chromosome 13q31- q 33 region biallelic markers (99-16038/1 18 (A 1 98), 99- 1 5880/ 1 62 (A218) 
99-59,9/215 (A75), 99-15875/165 (A228), 99-16032/292 (A223)) showed signified ' ' 
assocaation with schizophrenia in both familial and sporadic schizophrenia cases The 
s.gn.ficance of the absolute value of the difference of allelic frequency of the individual biallelic 
markers ,n the affected and the unaffected population is set forth in Figure 2, with several 
b,a lelic marker having allelic frequency differences with p-va.ues approaching or ,ess than 
0.05, b.alle.ic marker 99-5919/215 (A75) having a p-value of less than 0.01. Figure 2 also 
shows the physical order of certain specific biallelic markers. These results show that several 
b.allehc markers individually associated with schizophrenia are physically located in a 
particular region of significance, the subregion of the chromosome ,3q3,-q33 region referred to 
herein as Region D. 

Haplotype frequency analysis 

Analysis of markers Haplotype analysis for association of chromosome 13q3 l-q 33 
related biallelic markers and schizophrenia was performed by estimating the frequencies of all 
possible 2, 3 and 4 marker haplotypes in the affected and control populations described above 
Haplotype estimations were performed by applying the Expectation-Maximization (EM) 
algorithm (Excoffler and S.atkin, ,995), using the EM-HAPLO program (Haw.ey et al , 994) 
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as described above. Estimated haplotype frequencies in the affected and control population 
were compared by means of a chi-square statistical test (one degree of freedom). 
Haplotype association results in schizophrenia cases 

The results of the haplotype analysis using the chromosome 13q31-q33-related biallelic 

markers biallelic markers is shown in Figure 3. In particular, the figures show the most 

significant haplotypes using the biallelic markers: 99-16047/1 15 (A239), 99-16033/244 (A227), 

99-16038/1 18 (A 198), 99-15875/165 (A228), 99-16032/292 (A223), 99-5897/143 (AI07), 99- 

15880/162 (A218), 99-16082/218 (A270), 99-5919/215 (A75), 99-7652/162 (A62), 99- 

. 16100/147 (A65), 99-5862/167 (A70). 

A number of biallelic marker haplotypes were shown to be significantly associated with 

schizophrenia. A first preferred haplotype (HAP287 of Figure 3) consisting of four biallelic 

markers (99-16038/1 18 (A 198), 99-16082/218 (A270), (99-7652/162 (A62) and 99-16100/147 

(A65)) is highly significantly associated with schizophrenia in both total cases and sporadic 

cases. Figure 4 shows the characteristics of this haplotype. This haplotype presented a p-value 

-7 6 
of 3.1x10 and an odd-ratio of 4.01 for total cases and a p-value of 3.9x10 and an odd-ratio 

of 3.88 for sporadic cases. Phenotypic permutation tests confirmed the statistical significance 

of these results. Estimated haplotype frequencies were 13.8% in total cases, 13.5% in the 

sporadic cases, and 3.8 % in the controls. 

Several other significant haplotypes are listed in Figure 3, including several 2-, 3- and 

4-marker haplotypes. Considered to be highly significantly associated with schizophrenia are 

the most significant 2-marker haplotype (HAP1 consisting of biallelic markers 99-15875/165 

(A228) and 99-5919/215 (A75)) and the most significant 3-marker haplotype (HAP67 

consisting of biallelic markers 99-16038/1 18 (Al 98), 99-16082/218 (A270) and 99-7652/162 

(A218)). 

Further preferrred significant haplotypes considered associated with schizophrenia are 
haplotypes having p-values above a desired threshold level are also; all the haplotypes listed in 
Figure 3 present p-values below 1 .0x1 0" 2 for 2-marker haplotypes, 1 .0x1 0" 4 for 3-marker 
haplotypes, and l.OxlO" 5 for 4-marker haplotypes. All of the biallelic markers presented in 
Figure 4 except for 1 (99-16047/1 15 (A239)) are involved in haplotypes having a p-value above 
these threshold levels. Figure 3 shows several 2-marker haplotypes, HAP1 to HAP8, having p- 
values ranging from l.OxlO" 2 to 1 .2x10°, several 3-marker haplotypes, HAP67 to HAP76, 
having p-values ranging from 1 .3x1 0' 5 to 1 .0x1 0" 4 and several 4-marker haplotypes, HAP287 to 
HAP291, having p-values ranging from 8.2x1 0" 7 to 3.1xl0" 7 . Figure 4 shows biallelic markers 
involved in significant haplotypes having significance thresholds of l.OxlO" 2 , l.OxlO* 4 , and 
l.OxlO" 5 for 2-, 3- and 4-marker haplotypes, respectively. 
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Several 2-, 3- and 4-marker haplotypes, HAP I, HAP8, HAP70, HAP71 HAP75 
HAP76, HAP288, HAP290 and HAP291, often comprised the bia.lelic marker 99-5919/215 
(A75) allele A. Furthermore, several 2-, 3- and 4-marker haplotypes, HAP7, HAP67, HAP69 
HAP75, HAP287 AND HAP288, often comprised the biallelic marker 99-16038/1 18 (A 198) ' 
allele G, J 

Example 5b 

Association Study Between Schizophrenia And The Bial.eiic Markers Of The Invention 

Collection Of DNA Samp.es From Affected And Non-Affected Individuals 

B.allelic markers of the invention were further analyzed in the French Canadian 
population described above. For this analysis, the proband case population under study 
consisted of 139 individuals, the control population consisted of ,41 individuals, as described in 
J aole 10 below. 

Table 10 

Cases a~na control Populations Selected for the Associ^ 

Sample type 



Sample size 



Gender 



Cases 



Controls 



Male 
Female 



Familial history of psychosis (FH) 




139 



94 



45 



141 



96 



Genotyping of affected and control individuals 
A) Result s from the genotvpinp 

_ The general strategy for performing the association studies was to individually scan the 
DNA samples from all individuals in each of the populations described above in order to 
establish the allele frequencies of biallelic markers, and among them the bia.lelic markers of the 
mvent.on, in the diploid genome of the tested individuals belonging to each of these 
populations. 

Allelic frequencies of every biallelic marker in each population (cases and conrrols) 
were determined by performing microseqnencing reactions on amplify fragments obtained by 
genotmc PGR performed on tbe DNA samp.es from eacb individual Genomic PCR and 
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microsequencing were performed as detailed above in Examples 1 to 3 using the described PCR 
and microsequencing primers. 

Single biallelic marker frequency analysis 

For each allele of the biallelic markers included in this study, the difference between the 
5 allelic frequency in the unaffected population and in the population affected by schizophrenia 

was calculated and the absolute value of the difference was determined. The allelic frequencies 
of between the affected and the unaffected population in the regions is set forth in Table i I, 
using the sample population described above and in Table 10. The more the difference in allelic 
frequency for a particular biallelic marker or a particular set of biallelic markers, the more 
1 0 probable an association between the genomic region harboring this particular biallelic marker or 

set of biallelic markers and schizophrenia. Allelic frequencies were also useful to check that the 
markers used in the haplotype studies meet the Hardy- Weinberg proportions (random mating). 



Table 11 

Allelic frequencies of markers in differents sub-samples (%) 



Marker 


polymorphism 


Cases 


All controls 






All cases 


HF+ 


HF- 




99-20978/89 


C/G 


50,37 


47,26 


54,03 


55,43 


99-20983/48 


A/G 


30,37 


28,67 


32,5 


26,52 


99-20977/72 


A/C 


41,01 


42,11 


39,68 


34,4 


99-20981/300 


A/G 


52,17 


51,33 


53,17 


60 


99-6080/99 


C/T 


58,82 


58 


59,84 


54,85 


99-15229/412 


A/G 


54,92 


52,86 


57,26 


51,88 


99-22310/148 


C/T 


44,2 


46,71 


41,13 


48,57 


99-15232/291 


G/T 


43,85 


46,43 


40,83 


49,28 


99-14021/108 


A/G 


44,85 


47,26 


42,06 


48,54 


8-94/252 


A/G 


; 2,22 


1,97 


2,54 


2,52 


8-98/68 


A/G 


19,06 


17,76 


20,63 


19,06 


8-97/98 


C/T 


76,26 


74,34 


78,57 


77,3 


99-6012/220 


G/T 


20 


18,49 


21,77 


18,79 


99-7308/157 


C/T 


40,31 


41,89 


38,18 


39,36 


99-14364/415 


C/T 


39,93 


40,79 


38,89 


40 


8-95/43 


A/G 


20,29 


20,39 


20,16 


22,14 


99-15672/166 


C/T 


49,28 


47,37 


51,59 


56,74 


99-15668/139 


C/T 


58,21 


56,16 


60,66 


66,67 


99-15665/398 


A/G 


70,5 


67,76 


73,81 


76,79 
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J 99-15663/298 
1 99-15664/185 


C/T 


70,5 


67,76 


73,81 


1 76,95 1 


|~~99- 15682/3 18 


G/T 
A/T 


~j 66,54 
35,27 


62,33 
39,58 


71,43 


1 ?2 ' 5 
T 32,66 


1 99-20933/81 
1 99-26146/264 


A/C 


43,12 


42,76 


29,82 
43,55 


T~~ 42,45 j 




n/T 

VJ/ 1 


II 39,62 


38,67 


40,91 


1 38 > 85 | 


J 99-25922/147 
J 99-16081/217 
1 99-16082/218 


G/T 
C/T 


44,19 
42,28 


39,58 
38,82 


50 
46,67 


T~ 40,94 
T~ 36,74 


1 99-24656/260 
1 99-24639/163 
1 99-24634/108 


A/G 
A/G 
G/T 
A/T 


34,73 
48,87 
38,52 
44,85 


31,94 
49,32 
33,33 
42,67 


38,14 
48,31 

45 
47,54 


T 33,81 
j 54,04 
J 40,51 


1 99-7652/162 
99-16100/147 

J 99-5919/215 
J 99-24658/410 
J 99-24644/194 
I 99-5897/143 
1 99-24649/186 
I 99-15870/400 


A/G 
A/G 
C/T 
A/G 
C/T 
A/G 

A/C i 

~c7t 


| 45,29 
1 44,66 
43,53 
69,42 
64,13 
! 39,42 
57,61 
67,75 


44,08 
42,75 
41,45 
71,05 
69,08 
41,22 
60,67 
67,33 


46,77 

77 

46,03 
67,46 
58,06 
37,3 
53,97 
68,25 \ 


L 5o ' 36 

J 48,89 J 
| 49,29 

60,28 I 

61,07 

\ 40,5 1 1 
61,07 I 
62,95 


f 99-16038/118 
J 99-15880/162 
1 99-25940/182 
J 99-16032/292 
J 99-16033/244 
j 99-15875/165 
1 99-16047/115 
1 99-25993/367 


A/G 
A/G 

A/G |i 
A/G J 
A/C 
C/T 

C/T J 
C/T | 


33,46 
34,53 
65,11 
59,42 
64,03 
54,51 
56,88 
71,69 


36,67 
36,18 
63,16 
56,67 
61,84 
56,76 
57,89 
74,67 


29,51 i 
32,54 
67,46 
62,7 
66,67 
51,69 
55,65 J 
68,03 | 


30,29 

56.43 H 
5?,59 

55,67 

56.44 j 
75,19 j 


99-25989/398 
I 99-25979/93 


aTg f 

A/G | 
A/G I 


44^3 
32,81 
68,12 


40,79 
33,33 
69,08 


49,18 | 
32,2 


27,86 

6Q ^9 1 


99-25969/200 
j 99-25966/241 
1 99-25961/376 


G/T | 
A/G T 


36,67 
66,3 


38,67 
67,11 


66,94 \ 
34,17 
65,32 j 


38,85 | 
63,21 


1 99-25965/399 
| 99-25977/3 1 1 
j 99-25950/121 
I 99-25974/143 


A/C || 
A/G I) 
A/G T 
C/G 


50,36 
72,01 
31,75 


42,57 
51,97 
67,76 
36 


36,07 
48,39 
77,59 
26,61 | 


37,31 
49,64 

73,72 ] 
27,54 


99-26150/276 


A/G | 
A/G 


25,55 
46,54 


28,29 
51,43 j 


22,13 | 
40,83 


22,7 
47,76 
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99-15258/337 


G/T 


25,55 


26,97 


23,77 


24,1 


99-15261/202 


A/G 


63,06 


59,46 


67,5 


65,15 


99-15256/392 


C/T 


64,96 


61,33 


69,35 


65,3 


99-15056/99 


C/T 


32,72 


36,49 


28,23 


31,11 


99-15280/432 


C/T 


42,28 


44 


40,16 


38,97 


99-15355/150 


C/T 


72,3 


70,39 


74,6 


68,79 


99-15253/382 


C/T 


63,04 


62,67 


63,49 


62,95 


99-5873/159 


C/T 


78,1 


79,05 


76,98 


77,34 



Haplotype frequency analysis 

Analysis of markers Haplotype analysis for association of chromosome 1 3q3 l-q33- 
related biallelic markers and schizophrenia was performed by estimating the frequencies of all 
possible 2, 3 and 4 marker haplotypes in the affected and control populations described above. 
Haplotype estimations were performed by applying the Expectation-Maximization (EM) 
algorithm (Excoffier and Slatkin, 1995), using the EM-HAPLO program (Hawley et ah, 1994) 
as described above. Estimated haplotype frequencies in the affected and control population 
were compared by means of a chi-square statistical test (one degree of freedom). 

Haplotype association results in schizophrenia cases 

Haplotype studies yielded significant results indicating an association of the nucleotide 
sequences of the invention with schizophrenia. Significant results are shown in Figures 5 and 6, 
including descriptions of the frequency of the haplotype leading to the maximum chi square test 
(reference no. (1 ) in figures), the test of the frequency of a particular haplotype in cases vs in 
controls (reference no. (2) in figures) and the p- value assuming that the test has a chi-square 
distribution with 1 degree of freedom (ddl) (reference no. (3) in figures). The results of the 
haplotype analysis using 28 preferred biallelic markers of the invention, 99-24656-260 (A48), 
99-24639-163 (A60), 99-24634-108 (A61), 99-7652-162 (A62), 99-16100-147 (A65), 99-5862- 
167 (A70), 99-5919-215 (A75), 99-24658-410 (A76), 99-24644-194 (A80), 99-5897-143 
(A107), 99-24649-186 (A108), 99-16038-1 18 (A 198), 99-15880-162 (A218), 99-25940-182 
(A221), 99-16032-292 (A223), 99-16033-244 (A227), 99-15875-165 (A228), 99-16047-1 15 
(A239), 99-25950-121 (A285), 99-25961-376 (A286), 99-25965-399 (A287), 99-25966-241 
(A288), 99-25969-200 (A290), 99-25974-143 (A292), 99-25977-31 1 (A293), 99-25979-93 
(A295), 99-25989-398 (A299), and 99-26150-276 (A304) are shown in Figures 5 and 6. 
Figures 5 and 6 also show the physical order of the biallelic markers comprising the haplotypes. 

Figure 5 shows the results of the haplotype analysis using the following biallelic 
markers located on the approximately 31 9kb sequence of SEQ ID No. 1: 99-24656-260 (A48), 
99-24639-163 (A60), 99-24634-108 (A61), 99-7652-162 (A62), 99-16100-147 (A65), 99-5862- 
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167 (A70). 99-5919-215 (A75), 99-24658-410 (A76), 99-24644-194 (A80), 99-5897-143 
(A107), 99-24649-186 (A108), 99-16038-1 18 (A 198), 99-15880-162 (A218), 99-25940-182 
(A221), 99-16032-292 (A223), 99-16033-244 (A227), 99-15875-165 (A228), and 99-16047- 
115(A239). 

Figure 6 shows the results of the haplotype analysis using the following biallelic 
markers located on the approximately 3 19kb of SEQ ID No. las well as additional biallelic 
markers located on the human chromosome 13q31-q33 locus: 199-16038-1 18 (A 198) 99 
15880-162 (A218), 99-25940-182 (A221), 99-16032-292 (A223), 99-16033-244 (A227) 99 
15875-165 (A228), 99-16047-1 15 (A239), 99-25950-121 (A285), 99-25961-376 (A286)' 99- 
25965-399 (A287), 99-25966-241 (A288), 99-25969-200 (A290), 99-25974-143 (A292)' 99 
25977-3 1 1 (A293), 99-25979-93 (A295), 99-25989-398 (A299), and 99-26150-276 (A304) 

A number of biallelic marker haplotypes were shown to be significantly associated with 
schizophrenia. 

Several preferred haplotype all showing highly significant association with 
schizophrenia and including various 2-, 3- and 4- marker haplotypes are haplotypes 817 818 
and 819, 137, 138, 1 and 2 of Figure 6, and haplotypes 970, 154 and 1 of Figure 5 The p- 
values, odd-ratios and estimated haplotype frequencies are further described in Figures 5 and 6 
In part.cular, the two marker haplotype 1 of Figure 5 consisting of biallelic markers 99-5862- 
167 (A70) and 99-15875-165 (A228) showed a highly significant p-va.ue of 7.8xl0" 5 and an 
odd-rat,o of 1 .6 1 . Haplotype 8 1 8 of Figure 6 consisting of four biallelic markers (99-1 6032-292 
(A223), 99-25969-200 (A290), 99-25977-3 1 1 (A293), and 99-25989-398 (A299)) presented a 
p-value of 3.1x,0- 7 and an odd-ratio of 9.08. Another example showing significance is 
haplotype 817 of Figure 6 consisting of four biallelic markers (99-16033-244 (A227) 99 
15875-1*5 (A228), 99-25950-121 (A285) and 99-25979-93 (A295)), presented a p-va.ue of 
2.4x1 0- 7 and an odd-ratio of 1 00. Phenotypic permutation tests confirmed the statistical 
s.gn.ficance of these results. Estimated haplotype frequencies were 10.5% in cases and 0 % in 
the controls. Haplotype 970 of Figure 5 consisting of four biallelic markers (99-5919-215 
(A75), 99-24658-410 (A76), 99-15875-165 (A228), and 99-16047-1,5 (A239)) presented a p- 
value of 7.8X10' 7 and an odd-ratio of 2.41. Phenotypic permutation tests confirmed the 
stat.st.cal significance of these results. Estimated haplotype frequencies were 25.7% in cases 
and 12.5 % in the controls. 

Several other significant haplotypes are listed in Figures 5 and 6, including several 2- 
3- and 4-marker haplotypes. Considered to be highly significantly associated with 
sch I2 o P hre„ia are the most significant 2-marker haplotypes (for example haplotype 1 of Figure 
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5 noted above and the most significant 3-marker haplotypes (for example haplotype 137 of 
Figure 6 consisting of biallelic markers (99-15875-165 (A228), 99-16047-115 (A239) and 99- 
25950-121 (A285)). 

Further preferrred significant haplotypes considered associated with schizophrenia are 
haplotypes having p-values above a desired threshold level; all the haplotypes listed in Figures 5 
and 6 present p-values below 1 .OxIO" 2 for 2-marker haplotypes, 1 .OxlO 4 for 3-marker 
haplotypes, and l.OxlO' 3 for 4-marker haplotypes. Figures 5 and 6 show several 2-marker 
haplotypes, haplotypes 1 to 9 and haplotypes 1 to 5 of Figures 5 and 6 respectively, having p- 
values ranging from 7.8x1 0* 5 to 8.6x10°, several 3-marker haplotypes, haplotypes 154 to 163 
and 137 to 141 of Figures 5 and 6 respectively, having p-values ranging from 3.9x10"* to 
l.lxlO^and several 4-marker haplotypes, haplotypes 970 to 973 and 817 to 836 of Figures 5 
and 6 respectively, having p-values ranging from 2.4x1 0" 7 to 7.3x1 0" 6 . 

Additionally, a particularly large number of the significant 2-, 3- and 4-marker 
haplotypes often comprised the biallelic markers A223, A76, A227, A239, A286, A290, A299 
and most commonly A228 (99-15875-165), allele T. 

The statistical significance of the results obtained for the haplotype analysis was 
evaluated by a phenotypic permutation test reiterated 100 times on a computer. For this 
computer simulation, data from the affected and control individuals were pooled and randomly 
allocated to two groups which contained the same number of individuals as the case-control 
populations used to produce the data summarized in figures 5 and 6. A haplotype analysis was 
then run on these artificial groups for the markers included in the haplotypes showing strong 
association with schizophrenia. This experiment was reiterated 100 times and the results are 
shown in the columns of Figures 5 and 6 labelled "Haplotype test by permutation procedure". 
For a given haplotype, these results demonstrate the number of obtained (simulated) haplotypes 
having a p- value comparable to the one obtained for the given haplotype among 100 iterations. 
These results, set forth in Figures 5 and 6 validate the statistical significance of the association 
between the haplotypes and schizophrenia . 

Example 5c 

Association Study Between Schizophrenia and the Biallelic Markers of the Invention in 
French Canadian Samples 

Collection Of DNA Samples From Affected And Non-Affected Individuals 

Biallelic markers of the present invention were further genotyped in French Canadian 
samples as described above in order to compare the association of the 1st and the 2nd portion of 
Region D with schizophrenia. The population used in the study was the same as described 
above with the exception that 2 male FH+ cases were not included. 
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The b.allelic markers analyzed in the study include 34 preferred biallelic markers of the 
mvention located in Region D of the chromosome ,3 q 31-33 region. Included in the analysis 
were the 14 following biallelic markers from the first of two portions of Region D- 99- 
26150/276 (A304), 99-26156/290 (A307), 99-26153/44 (A305), 99-25985/194 (A298) 99- 
25974/143 (A292), 99-25977/3 , 1 (A293), 99-25972/317 (A291), 99-25965/399 (A287) 99- 
25961/376 (A286), 99-25966/241 (A288), 25967/57 (A289), 99-25969/200 (A290) 99-' 
25979/93 (A295) and 99-25989/398 (A299). Included in the analysis were also the 20 
followmg biallelic markers from the second of two portions of Region D: 99-25993/367 (A24H 
99-16047/1,5 (A239), 99-15875/165 (A228), 99-16033/244 (A227), 99-1 6032/292 (A223) 99 

25940/182(A221),99-15880/.62(A218),99-16038/118(A,98),99-,5870/400(A178) 99- _ 
24649/186 (A108), 99-5897/143 (A107), 99-24644/194 (A80), 99-24658/410 (A76) 99 
59,9/215 (A75), 99-5862/167 (A70), 99-16,00/147 (A65), 99-7652/162 (A62), 99-24634/108 
(A6 1 ), 99-24639/1 63 (A60) and 99-24656/260 (A48). 

Single biallelic marker association results in schizophrenia cases 
Sing,e biallelic marker studies yielded significant results, indicating an association of 
the nucleotide sequences of the invention with schizophrenia. Biallelic markers used in the 
analysis included the set of 34 bialle.ic markers shown in Table 1 1 below, ,4 biallelic markers 
of wh,ch were located on the first of two portions of Region D, and 20 of which were located on 
the second portion. The distribution of markers in shown in Table ,2 below. As summarized in 
Table 13 , analyses using these biallelic markers demonstrated a significant association with 
schizophrenia for 5 markers on the second portion of Region D. 

Table 11 
SNPS GENOTYPED 



REGION 


CONTIG 1 


D 






1 st portion 




2 nd portiioFs - 



POLYMORPHISM 



FREQUENCY IN 
CONTROLS 



99-26150/276 



99-26156/290 



99-25974/143 



99-25977/311 



99-25972/317 



99-25979/93 



q 99-25989/398 
°99-25993"/367 



A/G 



A/C 



A/G 



40 



72 



73 
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99-16032/292 


A/C 


58 


99-25940/182 


A/G 


53 


99-15880/162 


A/G 


58 


99-16038/1 18 


A/G 


42 


99-15870/400 


A/G 


33 


99-24649/186 


C/T 


65 


99-5897/143 


A/C 


59 


99-24644/194 


A/G 


39 


! 99-24658/410 


C/T 


58 


99-5919/215 


A/G 


60 


! 99-5862/167 


C/T 


51 


I 99-16100/147 


A/G 


50 


! 99-7652/162 


A/G 


52 


99-24634/108 


A/T 


53 


99-24639/163 


G/T 


44 


99-24656/260 


A/G 


54 



Table 12 



Region 


No. of Biallelic 1 
markers (a) 


Mean frequency 


Mean 

inter-marker 
distance (o) 


D I s * half 


14(14) 


0.34 (0.07) 


7(6.3) 


D 2 nd half 


20 (8) 


0.42 (0.06) 


11 (13) 


D I s ' and 2 nd half 


34 (22) 


0.39 (0.07) 


10.3(11) 



5 Haplotype frequency analysis 

Analysis of markers Haplotype analysis for association of chromosome 13q31-q33- 
related biallelic markers and schizophrenia was performed by estimating the frequencies of all 
possible 2, 3 and 4 marker haplotypes in the affected and control populations described above. 
Haplotype estimations were performed by applying the Expectation-Maximization (EM) 
10 algorithm (Excoffier and Slatkin, 1995), using the EM-HAPLO program (Hawley et al., 1994) 

as described above. 

Haplotype association results in schizophrenia cases 

Significant results were also obtained in haplotype studies indicating an association of 
the nucleotide sequences of the invention with schizophrenia. 

15 The present inventors having previously demonstrated highly significant association of 

biallelic markers located on the Region D subregion of the human chromosome 13q31-q33 
locus with disease. Using the Omnibus LR test which compares the profile of haplotype 
frequencies, and Haplo-maxM test which is based on haplotype differences for each haplotype 
in two groups, Figures 7 and 8 describe the results of an analysis of the first and second portions 

20 of Region D which demonstrated an association of the second portion of Region D with 
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schizophrenia. 

For combinations of 2 and 3 biallelic markers, one likelihood ratio test is obtained based 
on the haplotype frequency values calculated using the E-M algorithm. A permutation 
procedure was used, where data from the affected and contro. individuals was pooled and 
randomly allocated to two groups which contained the same number of individuals as the case 
control populations used to produce the data. A haplotype analysis was then run on these 
artificial groups for the markers included in the haplotypes showing strong association with 
sch.zophrenia. This experiment was reiterated ,00 times. For a given haplotype, these results 
demonstrate the number of obtained (simulated) haplotypes having a p-value comparab.e to the 
one obtained for the given haplotype among 1 00 iterations. 

Figure 7 shows a comparison of the LR test value distributions of haplotype 
frequencies in the two portions of Region D. This association of the second portion of Region 

Dw,th schizophrenia is shown using both 2-marker and 3-marker combinations The 
d.stribution of LR test values in the different regions was analyzed using a Kruskal-Wallis rank 
test, a ch.-square test with r-l degrees of freedom, where r represents the number of value sets 
compared. As shown, the significance of the association is demonstrated by a chi-square value 
(one degree of freedom) of 74.405 and a p-value of less than IxlO" 10 for 2 marker 
combinations, and a chi-square value (one degree of freedom) of 228.72 and a p-value of 1x10" 
for 3- marker combinations. 

Another association analysis approach based on haplotype frequeney differences 
referred «„ as ,he Haplo-maxM test, was conducted using region D biafielic markers. For one 
combtnation of markers having h haplotypes, h differences of hapiotype fluencies ^ ^ 
compared via a Peorson chi- sq „are statistic (one degree of freedom, The haplo-ma* tes, selects 

d.fference showing ,he tnaxhnum positive test value between cases versus controls 
(rejecting test values based on rare haplotype frequencies, i.e. with an estimated number of 
haplotypes Inferior to .0); for one combination of markers there is therefore one Max-M test 
value. The results of the Haplo-ntaxM tea, using Region D biallelic raarkers „ shmvn 
Figure 8. — 

Figure 8 shows the distribution of haplo-maxM test values obtained for both 2-marker 
and 3-marker combinations in the two potions of Region D, demonstrating an associate of 
the second portion of Region D with schizophrenia. The comparison of the distribution of 
Haplo-maxM test values oin the two regions was analyzed using a Kruskal-Wallis rank test a 
cht-square test with r-. degrees of freedom, where r represents the number of value sets ' 
compared. As shown, the significance of the association is demonstrated by a chi- sq „ a re value 
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(one degree of freedom) of 34.839 and a p-value of less than 3.58x1 0" 9 for 2 marker 
combinations, and a chi-square value (one degree of freedom) of 13.773 and a p-value of 
2.6x1 0" 4 for 3- marker combinations. 

The results from the haplo-maxM tests further confirms the association shown using the 
5 Omnibus LR test results. 

Results of association studies discussed above using biallelic markers of the invention 
are further summarized in Table 13 below, showing a significant association of the biallelic 
markers with schizophrenia in both single biallelic marker and haplotype analysis. 



Table 13 



10 





Single-point Analysis 


Multi-point analysis 
(Haplotype-based analysis) 




No. of allelic freq 
differences > 10% 


No. Significant 
allelic tests 


Omnibus LR TEST * 








2-mks 


3-mks 


4-mks 


Region D, 1 st 
portion 


0 


0 


0,03 


0,05 


0,06 


Region D, 2nd 
portion 


0 

i«4nt taotn / CO/ lA., n l 


5 


0,30 


0,30 


031 



* percentage of significant tests (5% level of significance) 
Cases (N=2 1 3) / Controls (N=24 1 ) 



Example 5d 

Association Study Between Bipolar Disorder and the Biallelic Markers of the Invention 
Description of study design 

1 5 Biallelic markers of the invention were analyzed in bipolar disorder cases. As in 

examples above, single and multi-point analyses showed a significant association of the markers 
of the invention, of Region D of the chromosome 13q33 locus, and more particularly of a sub- 
region of Region D with bipolar disorder. 

A^ Description of the Affected population 

20 A11 the samples were collected from a study of bipolar disorder undertaken in a hospital 

located south of Buenos Aires, Argentina, generally representing a population estimated at 
about 400,000 inhabitants. Patients were evaluated by four doctors in 1994 and 1995. The 
study design involved in the ascertainment of cases and their first degree relatives (parents or 
siblings). 514 individuals were available for the study. This group consisted of 1 58 subjects 

25 from 51 different families, and 356 independent subjects. 

As a whole, bipolar disorder cases were ascertained according to the diagnosis of 
bipolar disorder established by the DSM-IV (Diagnostic and Statistical Manual, Fourth edition, 
Revised 1994, American Psychiatric Press); 
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Available for consideration for each coded case were also age, sex, nationality of 
parents and grand parents, ethnic origin, familial composition, marital state, socio-economic 
level, educational level, professional situation, employment, relational activities, age of onset 
of phych,atric symptoms, age of first consultation, occurrences of obstetric or prenatal 
incidents, suicide attempts, other medical conditions, treatment for or occurrence of a 
neurological condition, familial occurrence of symptoms, previous or concurrent use of 
psychotropic drugs, other admissions to a hospital or medical treatments, and diagnostic reason 
for adm,ssion including (a) DSM-IV diagnosis and (b) symptoms first presented on admission 
to hospital. 

Available for study were 226 bipolar disorder ascertained cases of which 203 were 
independent cases. This group consisted of 51 cases from51 families, 20 cases in relatives 
thereof, and 155 independent cases. Upon elimination of 3 cases from the initial independent 
1 55 cases due to discovery of a familial relation, the total number of independent cases was 203 

Cases were classified according to bipolar disorder type. The cases included 1 1 5 
b.po.ar disorder type I individuals (including 1 rapid cycling case), 67 bipolar disorder type II 
mdmduals (including 1 rapid cycling case), 1 8 unclassified bipolar disorder cases, and 3 cases 
wh.ch remained unclassified due to lack of or inconsistent information. 

The 203 independent cases were examined for a familial history of psychosis 53 of 
these cases reported an occurrence of psychosis (characterized as schizophrenia or bipolar 
d.sorder) among first degree relatives (father, mother, brothers, sisters or children). 
Decription of the Un affected P QjMHation 
Available for study were 201 controls which had not been affected by any psychiatric 
difficult^ or reported any familial history of psychiatric difficulties. Available for 
consideration were also age, sex and ethnic origin of the unaffected population. 

C) Case and Control Population, S»l» cted for the Aviation St.,rt y 
For the association study, the case population under study consisted of 201 individuals 
selected from the 226 total cases above; the control population consisted of 198 individuals 
selected from the 20 1 controls described above. 

The association data that are presented in the Example 5d below were obtained on a 
population size detailed in Table 14 below. 



Table 14 


^ases and Control Populations Selected for the Association 
Mudy 






sample type 




Cases 


Controls 


Sample size 




201 


198 


Gender 
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Male 


68 


81 




Female 


124 


117 




Missing 


9 




Ethnic origin 








Causasian 


182 


177 




Non Caucasian 


5 


21 




Missing 


14 




Familial history of psychosis (FH)* 








positive (FH+) 


54 


0 




none (FH-) 


147 


198 


* : close relatives (first degree) 







Both case and control populations form two groups, each group consisting of unrelated 
individuals that do not share a known common ancestor. 

Genotvping of affected and control individuals 

The general strategy was to individually scan the DNA samples from all individuals in 
each of the populations described above in order to establish the allele frequencies of biallelic 
markers, and among them the biallelic markers of the invention, in the diploid genome of the 
tested individuals belonging to each of these populations. 

Allelic frequencies of every biallelic marker in each population (cases and controls) 
were determined by performing microsequencing reactions on amplified fragments obtained by 
genomic PGR performed on the DNA samples from each individual. Genomic PCR and 
microsequencing were performed as detailed above in Examples 1 to 3 using the described PCR 
and microsequencing primers. 

Association analysis 

The association analysis included 30 preferred biallelic markers of the invention located 
in Region D of the chromosome 13q31-33 region. Included in the analysis were the 14 
following biallelic markers from the first of two subjective portions of Region D: 99-26150/276 
(A304), 99-261 56/290 (A307), 99-26153/44 (A305), 99-25985/194 (A298), 99-25974/143 
(A292), 99-25977/3 1 1 (A293), 99-25972/3 17 (A291), 99-25965/399 (A287), 99-25961/376 
(A286), 99-25966/241 (A288), 25967/57 (A289), 99-25969/200 (A290), 99-25979/93 (A295) 
and 99-25989/398 (A299). Included in the analysis were also the 16 following biallelic markers 
from the second of two portions of Region D: 99-25993/367 (A241), 99-16047/1 15 (A239), 99- 
15875/165 (A228), 99-16033/244 (A227), 99-16032/292 (A223), 99-25940/182 (A221), 99- 
15880/162 (A218), 99-16038/1 18 (A198), 99-15870/400 (A 178), 99-24649/1 86 (A108), 99- 
5897/143 (A107), 99-24644/194 (A80), 99-5919/215 (A75), 99-5862/167 (A70), 99-16100/147 
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(A65), and 99-7652/162 (A62). 

A) Single hiallelic marker association result, in h. p ^ r disorder cases 
For each allele of the biallelic markers included in this study, the difference between the 
allehc frequency in the unaffected population and in the population affected by bipolar disorder 
was calculated and the absolute value of the difference was determined. The set of biallelic 
markers and their allelic frequencies included in this study are set forth in Table 1 5 The more 
the Terence in allelic frequency for a particular biallelic marker or a particular set of biallelic 
markers, the more probable an association between the genomic region harboring this particular 
b.allel,c marker or set of biallelic markers and bipolar disorder. Allelic frequencies were also 
useft,! to check that the markers used in the haplotype studies meet the Hardy-Weinberg 
proportions (under random mating assumptions) 

Table 15 



REGION 



CONT1G 



POSITION 
ON CONTIG 



SNPS GENOTYPED 



99-26150/276 
99-26156/290 
99-26153/44 
99-25985/194 
99-25974/143 



99-25965/399 
99-25961/376 
99-25966/241 
99-25967/57 
99-25969/200 
99-25979/93 



99-16047/115 



99-25940/182 



99-16038/118 



POLYMORPHISM 



A/G 



FREQUENCY 
IN CONTROLS 

62,93 
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383,41 


99-15870/400 


A/G 


29,65 






394,16 


99-24649/186 


C/T 


66,57 






395,27 


99-5897/143 


A/C 


52,6 






409,93 


99-24644/194 


A/G 


38,29 






424,95 


99-5919/215 


A/G 


60,63 






441,62 


99-5862/167 


C/T 


46,53 






444,00 


99-16100/147 


A/G 


48,84 






445,84 


99-7652/162 


A/G 


49,7 


TOTAL 






30 







(1) : frequency (%) in Caucasian controls (N=l 77) of the first allele (alphabetic order) 
Region D was arbitrarily split in two halves (D 1 st half and D 2 nd half) for purpose of the 
analysis. 

The present inventors have previously demonstrated significant association of biallelic 
markers located on the Region D subregion of the human chromosome 13q31-33 region with 
disease. Using a set of 30 biallelic markers shown in Table 15, D 1 st half contained 14 markers 
and D 2 nd half contained 16 markers. 

Table 15 also shows the physical order of the biallelic markers on Region D of the 
human chromosome 13q31-q33 region. The mean intermarker distances of the biallelic markers 
on the first and the second subjective portions of Region D were as listed below in Table 16. 



Table 16 



Region 


Mean 

Inter-marker distance (std) 


D l sl half 


7.80(6.33) 


D 2 nd half 


9.79 (8.78) 


D l sc and 2 nd half 


9.58 (8.46) 



The analysis using selected Region D biallelic markers of the invention demonstrated a 
significant association with bipolar disorder for the second portion of Region D. The analysis 
was conducted using the sample population described above with 182 Caucasian cases and 177 
Caucasian controls selected from the total case and control group. 

One biallelic marker in particular, 99-1 5875/1 65(A228), located on the second half of 
Region D, demonstrated a significant association with disease at a significance level of better 
than 5% (corresponding to an absolute logarithm (p-value) of 1 .3). 
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B) Haplotype association regujt s in hinolar disorder cases 
Haplotype analysis for association of chromosome 1 3q3 1 -q33-related biallelic markers and 
b.polar border was performed by estimating the frequencies of all possible 2, 3 and 4 marker 
haplotypes in the affected and control populations described above. Haplotype frequencies 
est.mations were performed by applying the Expectation-Maximization (EM) algorithm (Excoffier 
and Slatkin, 1995), modified by Nicholas Schork. 

Significant results were obtained in haplotype studies indicating an association of the 
nuc.eot.de sequences of the invention with bipolar disorder. The haplotype analysis as shown in 
the Figures 9A, 9B, 10A, .OB, 1 1 A and 1 IB was conducted using the sample popu.ation 
descnbed above, using 1 82 Caucasian cases and 1 77 Caucasian controls selected from the total 
case and control group. 

Using the Omnibus LR test which compares the profile of haplotype frequencies and 
HapIo-maxM test which is based on haplotype frequencies differences for each haplotype in 
two groups, Figures 9A, 9B, 10A, 10B, 1 1 A and 1 IB show the results of a comparison of the 
first and second portions of Region D which demonstrated an association of the second portion 
of Region D with bipolar disorder. 

a - Omnibus LR tests values 

For a given combination of 2, 3 or 4 biallelic markers, one likelihood ratio test (LR test) 
» obtamed based on the haplotype frequencies values calculated using the E-M algorithm 

Figures 9A and 9B show a comparison of the LR test value distributions of haplotype 
frequencies in the two portions of Region D. This association of the second portion of Region 
D w.th b.po.ar disorder is shown using both 2-marker and 3-marker combinations. A Kruska.l 
Walhs rank test was used to compare LR test values distributions in the two subjective portions 
of Reg,on D. This test has an asymptotic Chi-square distribution, under the null hypothesis of 
no deference between the sets compared, with (r-1) degrees of freeedom, where r represents the 
number of sets compared. Here, we compare the 2 portions of region D, so r=2, and the 
asymptotic Chi-square distribution has 1 degree of freedom. As shown, the significance of the 
assoc.at.on is demonstrated by a chi-square value (one degree of freedom) of 46.62 and a p- 
value of 8.62XKT 12 for 2 marker combinations, and a chi-square value (one degree of freedom) 
of 124.72 and a p-value of 5.86xicr 29 for 3- marker combinations, 
b - Haplo-max tests values 

Another association analysis approach based on haplotype frequencies differences 
referred to as the Haplo-max test, was conducted using region D biallelic markers. The hapio- 



WO 00/58510 PCT/IB00/00435 

247 

max test selects the difference showing the maximum positive (maxM) or negative (maxS) test 
value between cases versus controls (rejecting test values based on rare haplotype frequencies, 
i.e, with an estimated number of haplotypes carriers inferior to 10) ; for one combination of 
markers there is therefore one Max-M and one Max-S test values. 

Figures 10A and 10B show the distribution of haplo-maxM test values obtained for both 
2-marker and 3-marker combinations in the two portions of Region D, demonstrating an 
association of the second portion of Region D with bipoiar disorder. The comparison of the 
distribution of Haplo-maxM test values in the two regions was analyzed using a Kruskal-Wallis 
rank test, a chi-square test with 1 degree of freedom. As shown, the significance of the 

association is demonstrated by a chi-square value of 29.07 and a p-value 6.98x1 0" 8 for 2 marker 

combinations, and a chi-square value of 98.63 and a p-value of 3.04x10 for 3- marker 
combinations. 

Figures 1 1 A and 1 IB show the distribution of Haplo-maxS test values again obtained 
for all 2-marker and 3-marker combinations in the two portions of Region D, demonstrating an 
association of the second portion of Region D with bipolar disorder. The comparison of the 
distributions of Haplo-maxS test values in the two portions was analyzed using a Kruskal- 
Wallis rank test with one degree of freedom. As shown, the significance of the association is 

demonstrated by a chi-square value of 34.6 and a p-value of 4.05x10"^ for 2 marker 

combinations, and a chi-square value of 98.31 and a p-value of 3.58x10 for 3- marker 
combinations. 

The results from the haplo-maxM and haplo-maxS tests thus further confirm the 
association shown using the Omnibus LR test results. 

Example 5e 

Confirmation of Associations With Schizophrenia and Bipolar Disorder ("SCREENING 
II") 

Results obtained above using French Canadian schizophrenia samples and Argentinian 
bipolar disorder cases were confirmed in larger screening samples and in several different 
populations using markers spanning Region D of the chromosome 13q31-q33 region. 

In the confirmation studies, French Canadian schizophrenia samples (Algene) described 
above, additional United States schizophrenia samples and Argentinian bipolar disorder 
(Labimo) samples were analyzed in sub-regions of Region D referred to as sub-regions Dl to 
D4. The schizophrenia sample referred to as the Algene (or French Canadian) and the bipolar 
disorder sample referred to as the Labimo sample (Argentinian) are as described above. The 
United States schizophrenia samples are described in Table 17 below. 
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Table 17 



U^^ed^es Schizophrenia Cases and Control Popi^te 
(United States Caucasians) 

Sample type ~ — — — — 



Cases 




Random US 
Controls 



European Causasians 
(26 female, 66 male) 
Ashkenazi Caucasians 
(7 female, 17 female) 



92 



24 



Other Caucasians 
(7 female, 8 male) 
Familial history of psychosis (FH) 
positive (FH+) 



15 




188 



A set of 32 SNPs covering sub-regions Dl to D4 (mean density of 1 SNP/25kb)was 
genotyped on the two different schizophrenia samples and one bipolar disorder sample The 32 
oiallehc markers genotyped are shown in Table 18. 



Table 18 



SNPs 


Polymorphism 


% Frequency in Algene Controls(Hl) 


99-5873/159 


C/T 


22 


99-30329/380 


C/T 


48 


99-15253/382 


C/T 


37 


99-15280/432 


C/T 


r 39 


99-15256/392 


C/T 


! 35 


99-15258/337 


G/T 


24 


99-27345/189 


G/C 


26 


99-26150/276 


A/G 


48 " 


99-25974/143 


A/G 


23 ^ 


99-25950/121 


G/C 


28 


99-25972/317 


C/T 


28 ' 


99-25965/399 


A/G 


50 


99-25966/241 


A/G 


37 


99-25989/398 


A/G 


28 
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99-16047/115 


C/T 


25 


1 99-16052/214 


A/G 


37 


99-15875/165 


C/T 


34 


99-16105/152 


A/G 


46 


99-16032/292 


A/C 


44 


99-15880/162 


A/G 


44 


99-1 587Q/4QQ 


A/G 


30 


99-5897/143 


A/C 


39 


99-24644/194 


A/G 


41 


99-24658/410 


C/T 


39 


99-5919/215 


A/G 


40 


99-5862/167 


C/T 


49 


99-24634/108 


A/T 


50 


99-24656/260 


A/G 


46 


99-31960/363 


A/G 


39 


8-128/33 


C/T 


23 


99-27935/193 


G/C 


21 J 


99-27943/150 


QIC 


35 



For each of the three populations, the number of significant tests in each sub-region of 
Region D based on single and multiple point biallelic marker analyses were compared among 
cases and controls. For single point analyses, excess of heterozygotes and deficiency of 
heterozygotes (Hardy- Weinberg disequilibrium coefficient), allelic and genotypic association 
analyses and logistic regression analyses were compared. For multipoint analyses, the 
haplotypic frequency differences between case and controls were examined, reported as MaxM 
for the maximum positive difference, and MaxS as the maximum negative difference, and the 
Omnibus LR test. The HaploMax tests giving MaxS and MaxM results and the Omnibus LR 
test are known and otherwise described herein. As noted in Figures 12 to 17, the tests containing 
the footnote (1) involved significance thresholds which were assessed from observed 
distributions, inferred taking into account the Dl, D2, D3 and D4 subregions for each sub- 
population studied relative to the number of tests performed; for tests containing the footnote 
(2) in Figures 12 to 17, significant tests were defined as those having a significance level of 5% 
or better. 

The present inventors have found that samples from three different populations all show 
a significant association to the schizophrenia trait with biallelic markers located in region D, 
thus confirming previous association studies with different samples and markers. Furthermore, 
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the inventors have found in all three populations that the association is most significant in the 
sub-region D3. Thus, it is likely that a gene associated with schizophrenia and bipolar disorder 
res,des .n the region. The sbgl and g35030 nucleic acid sequences described herein reside in 
the region D3. 

In addition to results using markers in previous analyses, analyses with the 32 biallelic 
markers listed in Table 18 demonstrated significant results in single point analyses for several 
newly analyzed biallelic markers. In particular, markers 99-25974-143 (A292) 99-25972 317 
(A291), 99-15870-400 (A178), 99-24656-260 (A48) demonstrated a statistically s i g „ ificant 
excess or deficiency of heterozygotes. 

Schizophrenia: Algene (French Canadian) 

The analysis using Algene samples compared (1) the total patients cluster of patients 
selected for analysis (2) cases of the cluster showing a familial history of psychosis (FH+) and 
(3) cases of the cluster with an absence of familial history of psychosis (FH-) to Algene control 
samples. Additionally, for further comparison, the number of significant tests in Region D and 
each of the sub-regions of Region D were compared among total cases and total controls from 
the screening sample of example 5b above is listed in Figure ,2 under "first screening sample- 
As previously reported, the original French Canadian (Algene) samples show a significant 
assoc.at.on to the schizophrenia trait with biallelic markers located in region D, both in single and 
mult-point analyses. Furthermore, results show that the association is clearly confined to sub- 
re g .onD3anddoesnotextendt 0 D2andD4. In single point analyses, a significant concentration 
of b.a.lehc markers containing the sbgl gene prese nted an excess of heterozygotes for familial 
cases. Five of 13 (5/13) markers around sbg, were significant for a.le.ic association analysis 

F.gure 12 provides the results from a single and multi-point biallelic marker analysis 
comparing regions Dl, D2, D3, and D4 located in the chromosome 13q31-q33 region 

Figure 13 shows the sum of the results shown in Figure 12 over the larger Region D 
span tested (ie. D I, D2, D3 and D4). 

Figures 12 and 13 thus demonstrate that there is a significant association with Region D 
and schizophrenia in the Algene sample. Furthermore, these figures show that the number and 
percentage of significant tests was highest in sub-region D3 consistently across comparisons 
among controls and FH + , FH- and total cases. ,n comparing the number of significant tests in 
sub-region D3 among FH+ and FH- cases, the figures show that the association is more clear.y 
observed in cases having a familial history of psychosis; single point analyses suggested a 
h.gher number of significant tests in the FH+ cases than multiple point analyses. 

Figures 12 and 13 also show that a larger screening sample confirms the results of the 
smaller sample from the first screening of Algene samples, both for the larger Region D and for 
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the sub-region D3. 

Schizophrenia: United States Schizophrenia samples 

As in the French Canadian samples, the present inventors have found Region D, and 
more specifically sub-region D3, to be significantly associated with schizophrenia in a first, 
smaller screening sample of the United States Schizophrenia samples. Further analysis in the 
United States Schizophrenia population using a set of biallelic markers covering Region D 
confirms that the association of sub-region D3 with schizophrenia is of high statistical 
significance. 

The United States Schizophrenia samples selected for the analysis consisted of the 92 
European Caucasians. Two analyses were performed, a first analysis using controls consisting 
of 188 random US controls, and a second analysis where controls consisted of 241 controls 
from the French Canadian samples described above. 

Figure 14 provides the results from a single and multi-point biallelic marker analysis 
comparing regions Dl, D2, D3, and D4 located in the chromosome 13q31-q33 region. Figure 
15 shows the sum of the results shown in Figure 14 over the larger Region D span tested (ie. 
D1,D2, D3andD4). 

As shown in the figures, the analysis in United States Schizophrenia samples also 
suggests a significant association of sub-region D2 with schizophrenia, when considering multi- 
point analyses. However, this association of the D2 region with schizophrenia is of lesser 
statistical significance than the association of schizophrenia with sub-region D3 because a lower 
number of tests were carried out in the D2 region. Additionally, one marker (99-5897/143) in 
particuar, localized in the sbgl gene showed a significant excess of heterozygotes in 
schizophrenia familial cases. 

In general, the number of significant tests in United States Schizophrenia samples were 
lower that in French Canadian population. This may be attributed to the higher heterogeneity of 
the United States Schizophrenia sample in comparison to the French Canadian samples. 
Analyses using the United States Schizophrenia samples were done using either Caucasian 
controls from the French Canadian samples, or US random controls. 

Bipolar disorder: Labimo (Argentinian) 

As in the French Canadian and United States Schizophrenia samples, the present 
inventors have found Region D, and more specifically sub-region D3, to be significantly 
associated with bipolar disorder in Labimo samples from Argentina. Further analysis using a 
more extensive set of biallelic markers covering Region D confirms that the association of sub- 
region D3 with bipolar disorder is of high statistical significance. 

Figure 16 provides the results from a single and multi-point biallelic marker analysis 
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comparing regions Dl, D2, D3, and D4 located in the chromosome 13q31- q 33 region Figure 
17 shows the sum of the results shown in Figure 16 over the larger Region D span tested (ie 
Dl, D2, D3 and D4). While results showed the most significant association for D3 in Labimo 
samples, some background signal was seen for D2. It is possible that this variance in the 
percentage of significant tests reflects the higher relative heterogeneity of the Labimo samples 
in comparison to the French Canadian samples. 

Figures 1 6 and 1 7 thus demonstrate that there is a significant association with Region D 
and bipolar disorder in the Labimo sample. 

Analyses of Labimo samples were also conducted separately in bipolar disorder I and 
bipolar disorder II cases, as shown in Figure 1 6. In comparisons of results obtained with bipolar 
d.sorder I and II types, the association of sub-region D3 with schizophrenia tended to be more 
clearly found in bipolar disorder I cases. 



Example 6 

Preparation of Antibody Compositions to the sbgl protein 

Substantially pure protein or polypeptide is isolated from transfected or transformed cells 
contammg an expression vector encoding the sbgl protein or a portion thereof. The concentration 
of protem in the final preparation is adjusted, for example, by concentration on an Amicon filter 
dev,ce, to the level of a few micrograms/ml. Monoclonal or polyclonal antibody to the protein can 
then be prepared as follows: 

A. Monoclonal Antibody PmH„rHn n bv HvhriHnmn F,.e.„„ 

Monoclonal antibody to epitopes in the sbgl protein or a portion thereof can be prepared 
from murine hybridomas according to the classical method of Kohler, G. and Milstein, C, Nature 
256:495 (1975) or derivative methods thereof. Also see Harlow, E., and D. Lane. 1988 
Antibodies A Laboratory Manual. Cold Spring Harbor Laboratory, pp. 53-242. 

Briefly, a mouse is repetitively inoculated with a few micrograms of the sbgl protein or a 
pomon thereof over a period of a few weeks. The mouse is then sacrificed, and the antibody 
producing cells of the spleen isolated. The spleen cells are fused by means of polyethylene glycol 
with mouse myeloma cells, and the excess unfused cells destroyed by growth of the system on 
selecfve med.a comprising aminopterin (HAT media). The successfully fused cells are diluted and 
aliquots of the dilution placed in wells of a microtiter plate where growth of the culture is 
contmued. Antibody-producing clones are identified by detection of antibody in the supernatant 
flu,d of the wells by immunoassay procedures, such as EL1SA, as originally described by Engval. 
(1980), and derivative methods thereof. Selected positive clones can be expanded and their 
monoclonal antibody product harvested for use. Detailed procedures for monoclonal antibody 
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production are described in Davis, L. et al. Basic Methods in Molecular Biology Elsevier, New 
York. Section 21-2. 

B. Polyclonal Antibody Production by Immunization 

Polyclonal antiserum containing antibodies to heterogeneous epitopes in the sbgl protein 
or a portion thereof can be prepared by immunizing suitable non-human animal with the sbgl 
protein or a portion thereof, which can be unmodified or modified to enhance immunogenicity. A 
suitable non-human animal is preferably a non-human mammal is selected, usually a mouse, rat, 
rabbit, goat, or horse. Alternatively, a crude preparation which has been enriched for sbgl 
concentration can be used to generate antibodies. Such proteins, fragments or preparations are 
introduced into the non-human mammal in the presence of an appropriate adjuvant (e.g. 
aluminum hydroxide, RIBI, etc.) which is known in the art. In addition the protein, fragment or 
preparation can be pretreated with an agent which will increase antigenicity, such agents are 
known in the art and include, for example, methylated bovine serum albumin (mBSA), bovine 
serum albumin (BSA), Hepatitis B surface antigen, and keyhole limpet hemocyanin (KLH). 
Serum from the immunized animal is collected, treated and tested according to known 
procedures. If the serum contains polyclonal antibodies to undesired epitopes, the polyclonal 
antibodies can be purified by immunoaffinity chromatography. 

Effective polyclonal antibody production is affected by many factors related both to the 
antigen and the host species. Also, host animals vary in response to site of inoculations and 
dose, with both inadequate or excessive doses of antigen resulting in low titer antisera. Small 
doses (ng level) of antigen administered at multiple intradermal sites appears to be most 
reliable. Techniques for producing and processing polyclonal antisera are known in the art, see 
for example, Mayer and Walker (1987). An effective immunization protocol for rabbits can be 
found in Vaitukaitis, J. et al. J. Clin. Endocrinol. Metab. 33:988-991 (1971). 

Booster injections can be given at regular intervals, and antiserum harvested when 
antibody titer thereof, as determined semi-quantitative ly, for example, by double immunodiffusion 
in agar against known concentrations of the antigen, begins to fall. See, for example, Ouchterlony, 
O. et al., (1973). Plateau concentration of antibody is usually in the range of 0.1 to 0.2 mg/ml of 
serum (about 12 |iM). Affinity of the antisera for the antigen is determined by preparing 
competitive binding curves, as described, for example, by Fisher, D., Chap. 42 in: Manual of 
Clinical Immunology, 2d Ed. (Rose and Friedman, Eds.) Amer. Soc. For Microbiol., Washington, 
D.C.(1980). 

Antibody preparations prepared according to either the monoclonal or the polyclonal 
protocol are useful in quantitative immunoassays which determine concentrations of antigen- 
bearing substances in biological samples; they are also used semi-quantitatively or qualitatively to 
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identify the presence of antigen in a biological sample. The antibodies may also be used in 
therapeutic compositions for killing cells expressing the protein or reducing the levels of the protein 
in the body. ' 



Example 7 

Study of effect of sbgl peptides on behavior of mice 



Animals: 



Peptides: 



Protocol: 
I. 



Male C57BL6 adult mice (approximately 6 weeks old) 
sbgl peptide: 

NH2-QPLERMWTCNYNQQKDQSCNHKEITSTKAE-COOH 
Control 1: NH2-REAHKSETISSKLQKEKQIKKQ-COOH 
Control 2: N H2-MHMKTILGPRLGLGE-COOH 



Inject mice intraperitoneal ly with 50 ug peptide in 200 pi sterile 
physiological saline (n = 4 / peptide). 

2. Place mice in clean empty cages containing no litter, and only a small test 
tube rack. The cages are covered with a plastic sheet to enable taking 
photographs and video-tracking. 

3 . Observe behavior for one minute from t « 5 min up to t = 45 min, as 
indicated. Any locomoter or stereotypy movements were recorded as 
positive over 1 min intervals. Locomoter movements include climbing, and 
rearing while stereotypy movements include grooming and scratching.^ 
the end of the experiment, the number of movements were added up for 
each animal. 



Results: 



1 . Mice injected with the sbgl peptide exhibited a decrease in the frequency 
of their movements over the time course of the experiment, shown in 
Figure 18. This is illustrated in the left top panel of Figure 1 8, where we 
compare the average number of movements in 3 separate time points (5, 1 0, 
and 1 5 min) with the average movements per min in the last period of ' 
observations (30, 35, 40, and 45 min). The sbgl peptide also increased 
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stereotypy - this effect was most prominent during the last period of 
observations. However, because the onset of stereotype was variable, we 
presented the data as the average of stereotypy for observations over the 
entire time period. 
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The disclosures of all issued patents, published PCT applications, scientific references or 
other publications cited herein are incorporated herein by reference in their entireties. 

Although this invention has been described in terms of certain preferred embodiments 
other embodiments which will be apparent to those of ordinary skill in the art of view of the ' 
disclosure herein are also within the scope of this invention. Accordingly, the scope of the 
invention is intended to be defined only by reference to the appended claims. 
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CLAIMS 



1 . An isolated, purified or recombinant polynucleotide comprising a contiguous span of at least 
12 nucleotides selected from the group consisting of: nucleotide position range 213818 to 
243685 of SEQ ID No. 1 , SEQ ID Nos. 2 to 26 and 44 to 1 1 1 , and the complements thereof. 

2. An isolated, purified or recombinant polynucleotide comprising a contiguous span of at least 
12 nucleotides selected from the group consisting of: SEQ ID Nos 36 to 40, SEQ ID Nos. 1 12 
to 229, and nucleotide position ranges 3 1 to 29265 1 and 292844 to 3 1 9608 of SEQ ID No. 1 , 
and the complements thereof. 

3. An isolated, purified or recombinant polynucleotide according to claim 2, wherein said 
contiguous span of SEQ ID No 1 or the complements thereof comprises at least 1 of the 
following nucleotide positions of SEQ ID No 1 : 

(a) 292653 to 296047, 292653 to 292841, 295555 to 296047, and 295580 to 296047; 

(b) 31 to 1107, 1108 to 65853, 1108 to 1289, 14877 to 14920, 18778 to 18862, 25593 to 
25740, 29388 to 29502, 29967 to 30282, 64666 to 64812, 65505 to 65853 and 65854 to 67854- 

(c) 94124 to 94964; 

(d) 213818to215818,215819to215941,215819to215975,216661 to216952,216661to 
217061, 217027 to 217061, 229647 to 229742, 230408 to 230721, 231272 to 231412, 23 1787 
to 23 1 880, 23 1 870 to 23 1879, 234174 to 234321, 237406 to 237428, 239719 to 239807, 
239719 to 239853, 240528 to 240569, 240528 to 240596, 240528 to 240617, 240528 to' 
240644, 240528 to 240824, 240528 to 240994, 240528 to 241685, 240800 to 240993 and 
241686 to 243685; and 

(e) 201 1 88 to 201234, 214676 to 214793, 21 5702 to 215746 and 216836 to 216915 

4. An isolated, purified or recombinant polynucleotide according to claims 1 or 2, wherein said 
span comprises a biallelic marker selected from the group consisting of Al to A69, A71 to A74 
A76 to A94, A96 to A106, A108 to Al 12, Al 14 to A177, A179 to A197, A199 to A222, A224' 
to A246, A250, A251, A253, A255, A259, A266, A268 to A232 and A328 to A489. 

5. A recombinant vector comprising a polynucleotide according to any one of claims 1 to 4. 



6. A host cell comprising a recombinant vector according to claim 5. 
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7. A non-human host animal or mammal comprising a recombinant vector according to claim 
5. 

8. A mammalian host cell comprising an sbgl, g34665, sbg2, g35018 or g35017 gene 
disrupted by homologous recombination with a knock out vector, comprising a polynucleotide 
according to any one of claims 1 or 3. 

9. A non-human host mammal comprising an sbgl, g34665, sbg2, g35018 or g35017 gene 
disrupted by homologous recombination with a knock out vector, comprising a polynucleotide 
according to any one of claims 1 or 3. 

10. Use of a polynucleotide comprising a contiguous span of at least 12 nucleotides of a 
sequence selected from the group consisting of the SEQ ID Nos 1 to 26, 36 to 40 and 1 12 to 229 
or the complementary sequence thereto for determining the identity of the nucleotide at a 
biallelic marker selected from the group consisting of Al to A69, A71 to A74, A76 to A94, A96 
to A106, A108 to Al 12, Al 14 to A177, A179 to A197, A199 to A222, A224 to A246, A250, 
A251, A253, A255, A259, A266, A268 to A232 and A328 to A489. 

1 1. Use of a polynucleotide according to claim 10 in a microsequencing assay, wherein the 3* 
end of said contiguous span is located at the 3' end of said polynucleotide and wherein the 3* 
end of said polynucleotide is located 1 nucleotide upstream of a biallelic marker in said 
sequence. 

12. Use of a polynucleotide according to claim 10 in a hybridization assay, wherein said span 
includes a biallelic marker. 

13. Use of a polynucleotide according to claim 10 in a specific amplification assay, wherein 
the 3' end of said contiguous span is located at the 3* end of said polynucleotide and said 
biallelic marker is present at the 3 f end of said polynucleotide. 

14. Use of a polynucleotide according to claim 10 in a sequencing assay, wherein the 3* end 
of said contiguous span is located at the 3* end of said polynucleotide. 

15. A polynucleotide according to claim 1 1, wherein said polynucleotide consists essentially of 
a sequence selected from the group consisting of Dl to D69, D71 to D74, D76 to D94, D96 to 
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Dm, D108 10 D, 12, D, ,4 ,„ D,77, D,79,o D,97, DI99 ,„D222, D224 ,o D246 D250 
D251,D253,D255 > D259,D2 M> D268.oD232a„dD328 t oD360. ' ' 

.6. A polynucleotide according ,„ data 2, consisting esS e„ tia , ly of . sequence ^ from 

W t ;i'" 8 p :" : B ' » B22 '- C ' '° C22 '. Bl t o B229, C, ,o C229, and P, ,o P69, 
P71 ,0 P74, P76 ,o P94, P96 ,o P,06, P,08 ,o P, ,2, P„4 to P,77, P,79 ,„ P,97 P,99 ,„ P22 
P224 to P246, P250, P25 , , P253, P255, P259, P266, P268 ,„ P232 and P328 to P3o0 

1 7. Use of a polynucleotide comprising a contiguous spa „ „f at laa!t 12 „ ucleolides ^ 
sequence selected from rhe group consisting „„„e SEQ ID Nos 1 to 26, 36 to 40 and 1 ,2 ,„ 229 

b.al,C,c nance, ^d from the group conning of Al to A69, A71 to A74, A76 ,„ A94 A96 

U.A,06, A ,08,oAn2,A,14,„A I 77,A,7„„A 1 97,A,99.oA222,A224,oA246 A2'o 
A25 , , A253, A255, A259, A266, A268 ,„ A232 and A328 to A489. 

.8. A polynucleotide according ,„ any one of Cairns , ,„ 4, 1 5, or 1 6 attached ,„ a s „,id support. 
.9. An array of polynucleotides comprising a. .east one polynucleotide according to claim ! 8. 

20. An array according to claim 19, wherein said army is addressable. 

21. Apolynucleo.ide according ,„ any one of Cairns 1 ro 4, ,5, ,6or 18 ,o 20 further 
comprising a label. 

22. A method of genotyping comprising determining ,he identity of a nucleotide a, a biallelic 
marker selected from the group consisting of Al to A69, A7 , to A74, A76 to A94, A96 to 

A 06, A108 to Al 12, Al 14 to A.77, A179 ,„ A197, A199 to A222, A224 to A246 A250 

in a biological sample. 

23 A method of genotyping comprising determining the identity of a nucleotide at a Region D- 
related b.alle.ic marker, or the complement thereof in a biological sample. 

*LA method according to claim 22, wherein said biological sample is derived from a single 
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25. A method according to claim 22, wherein said biological sample is derived from multiple 
subjects. 

5 26. A method according to claim 22, further comprising amplifying a portion of said sequence 

comprising the biallelic marker prior to said determining step. 

27. A method according to claim 26, wherein said amplifying is performed by PCR. 

10 28. A method according to claim 22, wherein said determining is performed by a hybridization 

assay. 



29. A method according to claim 22, wherein said determining is performed by a sequencing 
assay. 

15 

30. A method according to claim 22, wherein said determining is performed by a 
microsequencing assay. 

31. A method according to claim 22, wherein said determining is performed by an enzyme- 
20 based mismatch detection assay. 

32. A method of estimating the frequency of ah allele of a biallelic marker in a population 
comprising: 

a) genotyping individuals from said population for said biallelic marker according to the method 
25 of claim 22; and 

b) determining the proportional representation of said biallelic marker in said population. 



33. A method of detecting an association between a genotype and a trait, comprising the steps 
of: 

30 a) determining the frequency of at least one biallelic marker in trait positive population 

according to the method of claim 32; 

b) determining the frequency of at least one biallelic marker in a control population according to 
the method of claim 32; and 

c) determining whether a statistically significant association exists between said genotype and 
35 said trait. 
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34. A method of estimating the frequency of a haplotype for a set of biallelic markers in a 
population, comprising: 

a) genotyping at least one biallelic marker according to claim 22 for each individual in 
said population; 

b) genotyping a second bia.le.ic marker by determining the identity of the nucleotides at 
sa,d second biallelic marker for both copies of said second biallelic marker present in the 
genome of each individual in said population; and 

c) applying a haplotype determination method to the identities of the nucleotides 
determined in steps a) and b) to obtain an estimate of said frequency. 

35. A method according to claim 34, wherein said haplotype determination method is selected 
from the group consisting of asymmetric PGR amplification, double PGR amplification of 
specific alleles, the Clark method, or an expectation maximization algorithm. 

36. A method of detecting an association between a haplotype and atrait, comprising the steps 

a) estimating the frequency of at least one haplotype in a trait positive population according to 
the method of claim 34; 

b) estimating the frequency of said „„ plotype m . ^ „ (<> 
claim 34; and 

O determining whether a statistical significant association exists between said haplotype and 
said trait. Jy 

37. A method according to claim 33, wherein said genotyping of step a) is performed on each 
individual of said population. 

38 A method according to claim 33, wherein said genotyping is performed on a single pooled 
biological sample derived from said population. 

39. A method of detecting an association between an allele and a phenotype, comprising the 
steps of: 1 

a) determine ,„e fre que „e y of a. leas, one bia.lelic marker aUele in a frai, positive population 
according to the method of claim 32; * ' 

b) defining the fre que „ cy „ f said bialleli e writer alleie in a eontro, population according to 
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the method of claim 32; and 

c) determining whether a statistically significant association exists between said allele and said 
phenotype. 

40. A method according to claim 33 or 36, wherein said trait is schizophrenia or bipolar 
disorder. 

41. A method according to claim 33 or 36, wherein said trait is predisposition to schizophrenia 
or bipolar disorder, an early onset of schizophrenia or bipolar disorder, or a beneficial response 
to or side effects related to treatment against schizophrenia or bipolar disorder. 

42. A method according to claim 39, wherein said phenotype is a symptom of schizophrenia or 
bipolar disorder. 

43. A method according to claim 33 or 36, wherein said control population is a trait negative 
population. 

44. A method according to claim 33 or 36, wherein said case control population is a random 
population. 

45. A method of determining whether an individual is at risk of schizophrenia or bipolar 
disorder, comprising: 

a) genotyping at least one biallelic marker according to the method of claim 22; and 

b) correlating the result of step a) with a risk of developing schizophrenia or bipolar disorder. 

46. A method according to any one of claims 22, 32 to 34, 36, 39 and 45 wherein said biallelic 
marker is selected from the group consisting of A 1 to A69, A71 to A74, A76 to A94, A96 to 
A106, A108 to Al 12, Al 14 to A177, A179 to A197, A199 to A222, A224 to A242, A250 to 
A251, A259 , A269 to A270, A278, A285 to A295, A303 to A307, A330, A334 to A335, A346 
to 357 and 361 to 489, and the complements thereof. 

47. A diagnostic kit comprising a polynucleotide according to any one of claims 1 to 4, 15, 16 
or 18 to 20. 



. A purified or isolated sbgl or g35018 polypeptide which is encoded by a nucleic acid 
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comprising a nucleotide sequence selected from the group consisting of SEQ ID Nos 2 to 26 
and 36 to 40, and fragments or variants thereof. 

49. A purified or isolated sbgl or g3501 8 polypeptide comprising at least 6 contiguous amino 
add residues of any of SEQ ID Nos 27 to 25 and 41 to 43. 

50. A purified or isolated sbgl or g35018 polypeptide according to claim 49, comprising at least 
one amino acid substitution, addition or deletion. 

51. A purified or isolated sbgl peptide consisting essentially of an amino acid position range 
selected from the group consisting of: 

1 to 63 and 64 to 102 of SEQ ID No 29; 1 to 64, 65 to 1 1 1 and 112 to 1 19 of SEQ ID No 30- 
1 to 64 and 65 to 126 of SEQ ID No 32; 1 to 64, 65 to 123 and 124 to 153 of SEQ ID No 34 '; 
and 1 to 61 and 65 to 106 of SEQ ID No 35. 

52. A method for producing an sbgl or g35018 polypeptide, wherein said method comprises the 
following steps: 

a) providing a cell host comprising a recombinant vector according to claim 5 containing a 
nucleic acid encoding an sbgl or g3501 8 polypeptide; 

b) recovering the sbgl or g35018 polypeptide produced by said recombinant cell host. 

53. The method according to claim 52, wherein the recombinant cell host is a recombinant cell 
host according to claim 6. 

54. An isolated or purified antibody composition capable of selectively binding to a polypeptide 
according to claim 49 to 5 1 . 

55. A method for specifically detecting the presence of an sbgl or g3501 8 polypeptide in a 
biological sample, said method comprising the following steps : 

a) bringing into contact the biological sample with an antibody directed against an sbgl or g3501 8 
polypeptide according to any one of claims 49 to 5 1 ; 

b) detecting the antigen-antibody complex formed between said antibody and said polypeptide. 

56. A diagnostic kit for detecting in vitro the presence of an sbgl or 3501 8 polypeptide in a 
biological sample, said kit comprising: 
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a) a polyclonal or monoclonal antibody directed against an sbgl or g35018 polypeptide according 
to any one of claims 49 to 5 1 or a fragment thereof, optionally labeled; 

b) a reagent allowing the detection of the antigen-antibody complexes formed between said sbgl or 
g35018 polypeptide and an antibody. 

57. A method for the screening of a candidate substance, wherein said method comprises the 
following steps: 

a) providing a polypeptide according to anyone of claims 49 to 5 1 ; 

b) obtaining a candidate substance; 

c) bringing into contact said polypeptide with said candidate substance; 

d) detecting the complexes formed between said polypeptide and said candidate substance. 

58. The method of claim 66, wherein at step d), the complexes formed are incubated in the 
presence of a polyclonal or a monoclonal antibody according to claim 63 . 

59. A kit for screening a candidate substance interacting with an sbgl or g35018 polypeptide, 
wherein said kit comprises: 

a) a polypeptide according to anyone of claims 49 to 5 1 ; 

b) optionally a monoclonal or a polyclonal antibody according to claim 54. 

60. A method for the screening of a candidate substance, where said method comprises the 
following steps: 

a) cultivating a prokaryotic or an eukaryotic cell that has been transfected with a nucleotide 
sequence encoding an sbgl protein or a variant or a fragment thereof, placed under the control 
of its own promoter; 

b) bringing into contact the cultivated cell with a molecule to be tested; 

c) quantifying the expression of the sbgl protein or a variant or a fragment thereof. 

61 . A method for identifying a compound for the treatment of a disease, where said method 
comprises the following steps comprising: 

(a) exposing an animal to a level of sbgl activity sufficient to cause a schizophrenia-related or 
bipolar disorder-related symptom or endpoint, and 

(b) exposing said animal to a test compound. 

62. A method according to claim 61, wherein said animal is a non-human mammal. 
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63. A method according to claim 61, wherein said animal is a non-human primate. 

64. A method according to claim 6 1 , wherein said animal is treated with an sbgl polypeptide 
according to any one of claims 49 to 51. 

65. A computer readable medium having stored thereon a sequence selected from the group 
consisting of a nucleic acid code comprising one of the following: 

a) acontiguous span of at least 12, 15, 18, 20, 25, 30, 35, 40, 50, 60, 70, 80, 90 100 150 
200, 500, 1000 or 2000 nucleotides of SEQ ID No. 1, and the complements thereof, wherein ' 
said contiguous span comprises at least one of the following nucleotide positions of SEQ ID No 
1 : 3 1 to 29265 1 and 292844 to 3 1 9608. 

b) a contiguous span of at least 12, 15, 18, 20, 25, 30, 35, 40, 50, 60, 70, 80, 90, 100, 150 
200, 500, 1000 or 2000 nucleotides of any of SEQ ID Nos. 54 to 229, and the complements 
thereof, to the extent that such a length is consistent with the particular sequence ID. 

c) a contiguous span ofat least 8, 12, 15, 18,20,25,30,35,40,45,50,55,60,65,70,75 
80, 90, 100 or 200 nucleotides, to the extent that such a length is consistent with the particular' 
sequence ID, of SEQ ID Nos. 2 to 26, 36 to 40, or the complements thereof. 

d) a contiguous span ofat least 12, 15, 18, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65 70 75 80 
90 or 100 nucleotides of SEQ ID No. 1 or the complements thereof wherein said contiguous 
span comprises at least one of the following nucleotide positions of SEQ ID No 1: 

(i) 292653 to 296047, 292653 to 292841, 295555 to 296047 and 295580 to 

296047; 

(ii) 31 to 1107, 1108 to 65853, 1108 to 1289, 14877 to 14920, 18778 to 18862 
25593 to 25740, 29388 to 29502, 29967 to 30282, 64666 to 64812, 65505 to 65853 and 65854 
to 67854; 

(iii) 94124 to 94964; 

(iv) 213818to215818,215819to215941,215819to215975,216661 to216952, 
216661 to 217061, 217027 to 217061, 229647 to 229742, 230408 to 230721, 231272 to 

23 1412, 231787 to 231880, 231870 to 231879, 234174 to 234321, 237406 to 237428 239719 
to 239807, 239719 to 239853, 240528 to 240569, 240528 to 240596, 240528 to 240617 
240528 to 240644, 240528 to 240824, 240528 to 240994, 240528 to 241685, 240800 to240993 
and 241686 to 243685; and 

(v) 201 188 to 21 6915, 201 188 to 201234, 214676 to 214793, 215702 to 215746 
and 216836 to 216915; 
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e) a contiguous span according to a), b), c) or d), wherein said span includes a biallelic 
marker selected from the group consisting of A 1 to A489. 

f) a contiguous span of at least 12, 15, 1 8, 20, 25, 30, 35, 40, 50, 60, 70, 80, 90, 100, 1 50, 
200, 500, 1000 or 2000 nucleotides of SEQ ID No. 1 or the complements thereof, wherein said 
contiguous span comprises at least 1, 2, 3, 5, or 10 nucleotide positions of any one the ranges of 
nucleotide positions designated posl to posl66 of SEQ ID No. 1 listed in Table 1 above; 

g) a contiguous span of at least 12, 15, 18, 20, 25, 30, 35, 40, 50, 60, 70, 80, 90, 100, 150, 
200, 500, 1000 or 2000 nucleotides of any of SEQ ID Nos. 2 to 26, 36 to 40 and 54 to 229, and 
the complements thereof, wherein said span includes a chromosome 13q31-q33-related biallelic 
marker, a Region D-related biallelic marker, an sbgl-, g34665-, sbg2-, g35017- or g35018 - 
related biallelic marker; 

h) a contiguous span of at least 12, 15, 1 8, 20, 25, 30, 35, 40, 50, 60, 70, 80, 90, 100, 150, 
200, 500, 1000 or 2000 nucleotides of any of SEQ ID Nos 2 to 26, 36 to 40 and 54 to 229, and 
the complements thereof, wherein said span includes a chromosome 13q31-q33-related biallelic 

1 5 marker, a Region D-related biallelic marker, an sbgl g34665-, sbg2-, g3501 7- or g3501 8 - 

related biallelic marker with the alternative allele present at said biallelic marker. 

i) a contiguous span of at least 12, 15, 18, 20, 25, 30, 35, 40, 50, 60, 70, 80, 90, 100, 150, 
200, 500, 1000 or 2000 nucleotides of any of SEQ ID No 1 , and the complements thereof, 
wherein said span includes a polymorphism selected from the group consisting of Al to A69, 

20 A71 to A74, A76 to A94, A96 to A106, A108 to Al 12, Al 14 to A177, A179 to A197, A199 to 

A222, A224 to A242 and 361 to A489; 

j) a nucleotide sequence complementary to any one of the contiguous spans of a), b), c), d), 
e), 0, g), h) and i). 

25 66. A computer readable medium having stored thereon a sequence consisting of a polypeptide 

code comprising a contiguous span of at least 6 amino acids of a polypeptide sequence selected 
from the group consisting of SEQ ID Nos. 27 to 35 and 41 to 43. 

67. A computer system comprising a processor and a data storage device wherein said data 

30 storage device comprises a computer readable medium according to any one of claims 65 ot 66. 

68. A computer system according to claim 67, further comprising a sequence comparer and a 
data storage device having reference sequences stored thereon. 



35 



69. A computer system of claim 68 wherein said sequence comparer comprises a computer 
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70. A computer system of claim 68 further comprising an identifier which identifies features in 
said sequence. 

71 . A method for comparing a first sequence to a reference sequence, comprising the steps of 

a) readmg said first sequence and said reference sequence through use of a computer program 
which compares sequences; and 

b) determining differences between said first sequence and said reference sequence with said 
computer program, 

wherein said first sequence is selected from the group consisting of a nucleic acid code 
comprising one of the following: 

(i) a contiguous span of at least 12, 15, 18, 20, 25, 30, 35, 40, 50, 60, 70, 80, 90 100 150 
200, 500, 1000 or 2000 nucleotides of SEQ ID No. 1, and the complements thereof, wherein ' 
sa,d confguous span comprises at least one of the following nucleotide positions of SEQ ID No 
1 : 3 1 to 29265 1 and 292844 to 3 1 9608. 

(ii) a contiguous span of at least 12, 15, 18, 20, 25, 30, 35, 40, 50, 60, 70, 80, 90 100 150 
200, 500, 1000 or 2000 nucleotides of any of SEQ ID Nos. 54 to 229, and the complements 
thereof, to the extent that such a length is consistent with the particular sequence ID 

(iii) a contiguous span of at least 8, 12, 15, 18, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65 70 75 
80, 90, 1 00 or 200 nucleotides, to the extent that such a length is consistent with the particular ' 
sequence ID, of SEQ ID Nos. 2 to 26, 36 to 40, or the complements thereof. 

(iv) a contiguous span of at least 12, 15, 18, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65 70 75 
80, 90 or 100 nucleotides of SEQ ID No. 1 or the complements thereof wherein said contiguous 
span compnses at least one of the following nucleotide positions of SEQ ID No 1 : 

a) 292653 to 296047, 292653 to 292841, 295555 to 296047 and 295580 to 296047- 

b) 31 to 1107, 1108 to 65853, 1108 to 1289, 14877 to 14920, 18778 to 18862 25593 
to 25740, 29388 to 29502, 29967 to 30282, 64666 to 64812, 65505 to 65853 and 65854 to 
67854; 

c) 94124 to 94964; 

d) 213818to215818,215819to215941,215819to215975,216661 to216952 
216661 to217061,217027to2l7061,229647to229742,230408to230721 23l272to 

231412, 231787 to 231880, 231870to 231879,234174 to 234321,237406 to 237428 239719 
to 239807, 239719 to 239853, 240528 to 240569, 240528 to 240596, 240528 to 240617 
240528 to 240644, 240528 to 240824, 240528 to 240994, 240528 to 241685, 240800 to 240993 
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and 241686 to 243685; and 

e) 20 1 1 88 to 2 1 69 1 5, 20 1 1 88 to 20 1 234, 2 ] 4676 to 2 1 4793, 2 1 5702 to 2 1 5746 and 
216836 to 216915; 

(v) a contiguous span according to (i) to (iv), wherein said span includes a biallelic marker 
selected from the group consisting of Al to A489. 

(vi) a contiguous span of at least 12, 15, 18, 20, 25, 30, 35, 40, 50, 60, 70, 80, 90, 100, 150, 
200, 500, 1000 or 2000 nucleotides of SEQ ID No. 1 or the complements thereof, wherein said 
contiguous span comprises at least 1, 2, 3, 5, or 10 nucleotide positions of any one the ranges of 
nucleotide positions designated posl to posl66 of SEQ ID No. 1 listed in Table 1; 

(vii) a contiguous span of at least 12, 15, 18, 20, 25, 30, 35, 40, 50, 60, 70, 80, 90, 100, 150, 
200, 500, 1000 or 2000 nucleotides of any of SEQ ID Nos. 2 to 26, 36 to 40, 54 to 229, and the 
complements thereof, wherein said span includes a chromosome 13q31-q33-related biallelic 
marker, a Region D-related biallelic marker, an sbgl-, g34665-, sbg2-, g35017- or g35018 - 
related biallelic marker; 

15 (viii) a contiguous span of at least 12, 15, 18,20,25,30,35,40,50,60,70,80,90, 100, 150, 

200, 500, 1 000 or 2000 nucleotides of any of SEQ ID Nos 2 to 26, 36 to 40, 54 to 229, and the 
complements thereof, wherein said span includes a chromosome 13q31-q33-related biallelic 
marker, a Region D-related biallelic marker, an sbgl-, g34665-, sbg2-, g35017- or g350I8 - 
related biallelic marker with the alternative allele present at said biallelic marker. 

20 (ix) a contiguous span of at least 12, 1 5, 1 8, 20, 25, 30, 35, 40, 50, 60, 70, 80, 90, 100, 1 50, 

200, 500, 1000 or 2000 nucleotides of any of SEQ ID No 1, and the complements thereof, 
wherein said span includes a polymorphism selected from the group consisting of Al to A69, 
A71 to A74, A76 to A94, A96 to A106, A108 to Al 12, Al 14 to A177, A179 to A197, A199 to 
A222, A224 to A242 and 361 to A489; 

25 (x) a nucleotide sequence complementary to any one of the contiguous spans of (i) to (ix); 

and 

(xi) a polypeptide code comprising a contiguous span of at least 6 amino acids of a 
polypeptide sequence selected from the group consisting of SEQ ID Nos 27 to 35 and 41 to 43. 

30 72. A method according to claim 71, wherein said step of determining differences between the 

first sequence and the reference sequence comprises identifying at least one polymorphism. 

73. A method for identifying a feature in a sequence, comprising the steps of: 
a) reading said sequence through the use of a computer program which identifies features in 
35 sequences; and 
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b) identifying features in said sequence with said computer program; 

wherein said sequence is selected from the group consisting of a nucleic acid code 
comprising one of the following: 

(0 a contiguous span of at least 12, 15, 18, 20, 25, 30, 35, 40, 50, 60, 70, 80 90 100 150 
200,500, 1000or2000nucleotide S ofSEQIDNo. 1 , and the complements thereof, wherein ' 
sa,d cont,guous span comprises at least one of the following nucleotide positions of SEQ ID No 
1 : 3 1 to 292651 and 292844 to 3 19608. 

(ii) a contiguous span of at least 12, 15, 18,20,25,30,35,40,50,60,70,80,90 100 150 
200, 500, 1000 or 2000 nucleotides of any of SEQ ID Nos. 54 to 229, and the complements 
thereof, to the extent that such a length is consistent with the particular sequence ID. 

(iii) acontiguouss P anofatleast8 5 12,15,18,20,25,30,35,40,45,50,55,60 65 70 75 
80, 90, 100 or 200 nucleotides, to the extent that such a length is consistent with the particular ' 
sequence ID, of SEQ ID Nos. 2 to 26, 36 to 40, or the complements thereof. 

(iv) a contiguous span of at least 12, 15, 18, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65 70 75 
80, 90 or 100 nucleotides of SEQ ID No. 1 or the complements thereof wherein said contiguous 
span comprises at least one of the following nucleotide positions of SEQ ID No 1- 

a) 292653 to 296047, 292653 to 292841, 295555 to 296047 and 295580 to 296047- 

b) 31 to 1107, 1108 to 65853, 1108 to 1289, 14877 to 14920, 18778 to 18862 25593 
to 25740, 29388 to 29502, 29967 to 30282, 64666 to 64812, 65505 to 65853 and 65854 to 
67854; 

c) 94124 to 94964; 

d) 213818 to215818,215819to215941, 215819 to 215975, 216661 to 216952 
216661 to217061,217027to217061,229647to229742,230408to230721 231272to 
231412, 231787 to 23 1880, 23 1870 to 231879, 234174 to 234321, 237406 to 237428 239719 
to 239807, 239719 to 239853, 240528 to 240569, 240528 to 240596, 240528 to 240617 
240528 to 240644, 240528 to 240824, 240528 to 240994, 240528 to 241685, 240800 to 240993 
and 241686 to 243685; and 

e) 201188to216915, 201188 to 201234,214676 to 214793, 215702 to 215746 and 
216836 to 216915; 

(v) a contiguous span according to (i) to (iv), wherein said span includes a biallelic marker 
selected from the group consisting of A 1 to A489. 

(vi) a contiguous span of at least 12, 15, 1 8, 20, 25, 30, 35, 40, 50, 60, 70, 80, 90 100 150 
200, 500, 1000 or 2000 nucleotides of SEQ ID No. 1 or the complements thereof, wherein said 
cont.guous span comprises at least 1, 2, 3, 5, or ,0 nucleotide positions of any one the ranges of 
nucleotide positions designated posl to P osl66 of SEQ ID No. 1 listed in Table 1; 
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(vii) a contiguous span of at least 12, 15, 18, 20, 25, 30, 35, 40, 50, 60, 70, 80, 90, 100, 150, 
200, 500, 1 000 or 2000 nucleotides of any of SEQ ID Nos. 2 to 26, 36 to 40, 54 to 229, and the 
complements thereof, wherein said span includes a chromosome 13q31-q33-related biallelic 
marker, a Region D-related biallelic marker, an sbgl-, g34665-, sbg2-, g35017- or g35018 - 

5 related biallelic marker; 

(viii) a contiguous span of at least 12, 15, 1 8, 20, 25, 30, 35, 40, 50, 60, 70, 80, 90, 100, 150, 
200, 500, 1 000 or 2000 nucleotides of any of SEQ ID Nos. 2 to 26, 36 to 40, 54 to 229, and the 
complements thereof, wherein said span includes a chromosome 13q31-q33-related biallelic 
marker, a Region D-related biallelic marker, an sbgl-, g34665-, sbg2-, g35017- or g35018 - 

10 related biallelic marker with the alternative allele present at said biallelic marker. 

(ix) a contiguous span of at least 12, 15, 18, 20, 25, 30, 35, 40, 50, 60, 70, 80, 90, 100, 150, 
200, 500, 1000 or 2000 nucleotides of any of SEQ ID No 1, and the complements thereof, 
wherein said span includes a polymorphism selected from the group consisting of A 1 to A69, 
A71 to A74, A76 to A94, A96 to A106, A108 to Al 12, Al 14 to A177, A179 to A197, A199 to 

15 A222, A224 to A242 and 361 to A489; 

(x) a nucleotide sequence complementary to any one of the contiguous spans of (i) to (ix); 
and 

(xi) a polypeptide code comprising a contiguous span of at least 6 amino acids of a 
polypeptide sequence selected from the group consisting of SEQ ID Nos 27 to 35 and 41 to 43. 

20 
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Figure 4 

Markers involved in sele cted haplotypes 
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